Edge sign prediction based on a combination of network structural topology and sign propagation

Edge sign prediction based on a combination of network structural topology and sign propagation Abstract The prediction of edge signs in social and biological networks is a major goal of graph-based machine learning and has important implication in recommendation systems. Most current edge sign prediction methods rely on information propagation from neighbouring edges either directly by assuming sign similarity in neighbouring edges or using more complex theories based on combination of edge signs in neighbours. Such methods rely on a high network sampling fraction, and fail at low sampling level. We, here, show that edges with similar network topology, as defined by a combination of network measures have similar signs. The surprising correlation between network topology and edge sign can be used for prediction. Indeed, machine learning algorithm based on this topology can produce a higher accuracy than state of the art methods in standard datasets, even when a very small fraction of the edge signs are known, with an accuracy of up to 93%. We further show that different datasets differ in the importance of different features. A combination of features is always required to obtain a high area under the curve. When the vertices represent people, the sign is mainly affected by the edge target. When the network represents opinions, the signs are mainly affected by the edge source. The proposed method can be applied to directed and undirected, weighted and unweighted networks. 1. Introduction Edges in social and biological networks can have different signs, representing among many other possibilities people with a positive or negative attitude to other people or proteins/neurons that activate or inhibit other proteins/neurons. Edge sign prediction has attracted much interest, mainly in social networks [1]. Sign prediction has multiple real-world applications, such as testing the legitimacy of connection between people and detecting edges that have a predictive power for the properties of vertices. The edge sign prediction problem can be defined as follows: assume a network $$G = (V,E)$$, where $$V$$ are vertices and $$E$$ are edges, with signs on all edges, but only a limited fraction of the signs are known. The edge sign prediction problem is to reliably infer the sign of an edge with a hidden sign, using $$V,E$$ and the known signs. Most edge sign predictions are based on the assumption that neighbouring edges have similar signs, and thus edge sign can be estimated based on sign propagation algorithms [2]. Similar principles of homophily and peer influence have been applied to vertex colour/sign prediction [3, 4]. Note that while it may be hard to distinguish between homophily and peer influence, such an identification is not required for prediction. Sign propagation models are based on two main social theories: ‘balance’ and ‘status’. The ‘balance’ theory represents the assumptions that: ‘the friend of my friend is my friend’ (i.e., ($$u,v)$$ and ($$v,w)$$ positive edges increase the probability of a ($$u,w)$$ positive edge), ‘the enemy of friend is my enemy’ and similar combinations [5]. The ‘status’ theory represents the assumption that one refers in a positive way to a person that has higher status than him, and in a negative way to person with lower status than him [6]. A crucial limitation of such theories for machine learning application is the requirement of a large enough number of edges with known signs surrounding the edge for which the sign is predicted. Formally, assume a graph with positive or negative sign on each edge. Our goal is to predict the sign of edges with unknown signs, using a limited fraction of signed edges (e.g., 20% of known signs, with the target of predicting the other 80%), denoted here as the partial information setup. Such a setup often occurs when ties are known, but for the observed tie, the sign is unknown for a large fraction of the edges. Note that this is different than most sign-prediction setups, where one assumes that the sign of all edges but one are known, and one attempts to predict the unknown edge sign [2, 7], denoted here as a full information setup. The full information setup is obviously easier to solve, since for high average degree network, the vast majority of edges have neighbours with known signs. We, here, propose that a different principle may be used to predict edge sign even in sparsely sampled networks. We have recently proposed based on theoretical and experimental observations that the dynamics affecting edge and node signs is coupled with the edge addition and deletion dynamics [8, 9]. This coupling suggests that the sign of an edge and the topology of the graph around it may be also coupled, since the graph topology is determined by the edge addition and removal dynamics. Such a coupling would emerge through two possible mechanism. Either the sign of ($$u,v)$$ can affect the network dynamics of vertices $$u$$ and $$v$$, and these dynamics are mirrored in the network attributes of $$u$$ and $$v$$, and the relation between them, or the sign of an edge is determined by the network structure around the edge. In both cases, edges with similar topological features may have similar signs. For example, if feedback circles encourage positive reactions, then the number of coherent circles (u> v-> w-> u) may be correlated with a positive edge sign. In such a case, this feature may be used to predict the edge sign. Over the past decade, multiple studies by ourselves and others have proposed correlations between vertex/edge properties and graph structure [10–13]. Herein, we show that the same concept holds for edge signs, but that this relation is highly network dependent. Moreover, the correlation between the graph topology and the edge signs is strong enough to classify edges even in a low information setup, and when combined with very simple information propagation, it leads to better accuracies than the ones obtained in the full information setup for standard test problems. Using the current approach, even in networks, where the signs are scarcely sampled, the remaining signs can be predicted with an accuracy ranging between 82% and 93% in multiple real-world networks. The difference between the current approach and sign propagation approaches can be seen in Fig. 1. In Box 1, we can assume by sign propagation that the unknown edge will be probably positive. However, in Box 2, prediction of the unknown sign only by sign propagation will be inaccurate, since both input and output have low degrees, and mixed edge signs. This becomes even more complicated if only a small fraction of edges will have known signs. Fig. 1. View largeDownload slide (A) Box 1—In this rectangle, edge signs are best computed by sign propagation. Box 2—here, sign propagation would prove difficult to use. Boxes 3 and 4 have different signs but similar topology and may be used to infer from one to the other. (B) Thirteen different non-isomorphic directed graphs with three vertices are used in the current analysis. For each edge, we count how many times it is located in each of these motifs. The numbers in Fig. 3 correspond to the parallel numbers here. (C) For each vertex feature, we take the mean and the subtraction of the features of the target and source vertices to produce edge features. The subtraction is set to be source vertex minus target vertex divided by two, with no loss of generality. Fig. 1. View largeDownload slide (A) Box 1—In this rectangle, edge signs are best computed by sign propagation. Box 2—here, sign propagation would prove difficult to use. Boxes 3 and 4 have different signs but similar topology and may be used to infer from one to the other. (B) Thirteen different non-isomorphic directed graphs with three vertices are used in the current analysis. For each edge, we count how many times it is located in each of these motifs. The numbers in Fig. 3 correspond to the parallel numbers here. (C) For each vertex feature, we take the mean and the subtraction of the features of the target and source vertices to produce edge features. The subtraction is set to be source vertex minus target vertex divided by two, with no loss of generality. However, one may instead use network topologies, and classify the vertices based on the topology of the network surrounding them. If edges with similar topologies have typically similar signs, we can infer the sign of the current edge. Different topological elements can be used to classify edges, such as small scale motifs (Fig. 1B), degree, centrality, clustering and others network features [14–17]. It is important to note that we do not use the sign of the edges in these features, so that the features can be computed on all edges, even if the majority of those have hidden signs. Similar approaches have been developed in order to classify the colour of vertices (see for example among many others [10, 11, 18]). We now apply the same principles to edge sign prediction. 2. Methods 2.1 Algorithm description The sign prediction algorithm consists of three steps, as further detailed below: Each edge is associated with a vector of topological edge properties (VOTE), such as centrality measures, degrees and small scale motifs frequencies. Each dimension (attribute) is normalized to produce $$z$$ scores. Neural networks and other machine learning approaches are applied on the VOTEs. The algorithms used here were among others: Random Forest [19], AdaBoost [20], support vector machine (SVM) and feed forward neural networks. However, any other classifier can be used. 2.2 Vector of topological edge properties We define for each edge in the network a VOTE. This VOTE is a set of discrete/continuous values representing a large set of local and global network measures. Note that the VOTE of an edge is not affected by its classification. The following measures were used for each VOTE: In-degree, out-degree. The number of vertices that are connected to a vertex via an edge. The value for each edge is computed by taking the mean and the subtraction of the vertices surrounding the edge. Three vertices small scale motifs [21]: all possible directed connected sub-graphs with three vertices containing the current edge (taking into consideration isomorphisms) were counted. The well-known clustering coefficient measure was omitted due to its redundancy, since triplets and triangles are already counted in size three motifs [22]. For each motif, we increment the count of all the edges that participate in the motif by one. Each edge has a vector of its contribution to 13 possible motifs (Fig. 1B). Shortest path measures: two measures of different aspects of the participation of a vertex in shortest paths of the network are Closeness centrality [23], as well as the first and second moments of the distance distribution as estimated by the breadth-first search (BFS) algorithm. In order to translate vertex attributes to edge attributes in the VOTEs we perform a subtraction and average of the vertices attributes (Fig. 1C). Extended walks measures: a measure that implements ranking by walks on the network beyond the shortest paths is Pagerank [17]. The Pagerank score expresses the probability of reaching a vertex by a random walk (random surfer model). 2.3 Z scoring The VOTEs contain a variety of measures that can be discrete or continuous and bounded or unbounded (up to very large number limited by a function of the total vertex or edge number). Measures of networks, such as degrees and motif frequencies, are often heavy tailed [24]. We used their log to avoid extreme values. Attributes with either positive or zero values were assigned a minimal value in the case of 0, before a 10-base log was applied (0.01 in the current realization). All measures were then $$Z$$ scored to a 0 mean and a unit standard deviation, leading to the following $$z$$ score for a non-negative heavy tail variable $$x$$.   \begin{align} \begin{split} z&=\log_{10} (x+0.01) \\ z&=\frac{z-\left\langle z \right\rangle }{std(z)} \\ \end{split} \end{align} (1) 2.4 Classification of edges based on their topological properties The VOTE method can be implemented using any binary classifier. Herein, we tested it with several of the most common algorithms, all implemented using the Scikit-Learn python package. In addition, we applied deep learning using the Keras package feed forward neural networks, with a Theano backend. The percentage of train and test samples was 80% and 20%, respectively, unless explicitly stated otherwise. The following classifiers were used: AdaBoost classifier [25]: the maximum number of estimators at which boosting is terminated was set to 100. In case of a perfect fit, the learning procedure was stopped. Random Forest (RF) classifier [26]: the number of trees in the forest was set to 1000. The splitting criterion was defined to be the Gini index. The maximum depth of the tree was set to 3. The minimum number of samples required to split an internal vertex was 15. The class weight was defined to be ‘balanced’, which uses the class frequency to automatically adjust weights inversely proportional to class frequencies. SVM with stochastic gradient descent (SGD): we performed simple SGD with loss function and regularization as in the SVM [27] loss function. Our regularization was also standard as SVM regularization. Deep learning: the network used was composed of three layers. We used the ReLU activation function, 0.4 rate regularization of l1_l2 and a dropout rate of 0.4 rate for the internal layers. The output layer activation function was a sigmoid. The loss function was binary cross-entropy. The network was fully connected [28–31]. All other parameters were the Keras default parameters. The optimizer was the ‘ADAM’ stochastic gradient. 2.5 Statistical analysis The precisions of the classifiers were based on the AUC (area under the curve). The AUC is a common evaluation metric for binary classification problems, which is based on the relation between the false positive rate and the true positive rate. For the comparison with previous results max. accuracy was used for consistence. 3. Results In order to show that topological features, such as the clustering coefficient, degree, small scale motif distribution and centrality are correlated with edges sign, and can be used to predict the sign of edges, we analysed multiple standard networks, where state of the art prediction algorithms were tested for edge prediction: Epinions—is a reviews Website, where the users can trust ($$+)$$ or distrust ($$-)$$ each other’s review. This dataset has 131 828 vertices and 841 372 edges. Signs in Epinions represent the attitude of the reviewer, and the quality of the review. Slashdot—is a tech blog where bloggers can like ($$+)$$ or dislike ($$-)$$ other blogger comments. This dataset has 82 144 vertices and 549 202 edges. Wikipedia—the request for admin votes, where a positive link represent a vote for a user and negative links represent votes against. Some links have neutral signs, representing neutral votes. This dataset has 10 835 vertices and 159 388 edges. We did not include edges with neutral signs in the prediction. For each of these networks, we computed for each edge a VOTE, using a combination of edge specific properties and vertex specific properties, combined into edge properties through the subtraction of the source and the mean of the two vertices that are connected by the edge (Fig. 1C). Formally, assume an edge $$(u,v)$$ from vertex $$u$$ to vertex $$v$$, with a set of edge attributes $$t_{(u,v)} $$, and vertex attributes for each surrounding vertex $$m_{u} ,m_{v} $$. We define the VOTE of the edge $$(u,v)$$ to be:   \begin{equation} V_{(u,v)} =\left[ {t_{(u,v)} ,\frac{m_{u} +m_{v} }{2},m_{v} -m_{u} } \right], \end{equation} (2) where all elements are row vectors. We chose the average and difference, instead of the features of each vertex by itself to produce an edge-centric approach representing for example the average degree of an edge as the average number of edges pointing to any of the vertices surrounding the edge. The second element (the difference) incorporate the directionality by representing the difference between the source and target vertices of the edge. Note that other linear combinations could have been used, but since we are using a neural network approach, even if the current combination is not optimal, we expect the network weights to produce an optimal linear combinations based on the current representation. In the current realization, the VOTE vector has 31 graph-topology based features: $$2 \times 2$$ -In & out degree, 13-motifs, $$5 \times 2 $$ Centrality measures and $$2 \times 2$$ Community measures. It is easy to see that the topology features can be computed for each edge in the graph regardless of the sign of the edge and more importantly regardless the sign of its neighbouring edges in contrast with methods based on the neighbours class. To show that VOTEs are indeed correlated with the sign of edges, the Spearman correlation of each feature and the edge sign was computed. The correlations vary over a wide range, but some features, such as specific small scale motifs reach a correlation of $$-$$0.6 in the Epinions network. In other networks, the correlations are lower, but still very significant (Fig. 2). Fig. 2. View largeDownload slide Spearman correlations between features and the plus sign in the different networks studied. The features are ranked from bottom to top according to the sum of their correlation absolute values in all three networks. One can clearly see that in the Epinion network the edge sign is the most associated with network features, but that correlations exist in all networks. Moreover, the correlations for each feature are different in different networks. Fig. 2. View largeDownload slide Spearman correlations between features and the plus sign in the different networks studied. The features are ranked from bottom to top according to the sum of their correlation absolute values in all three networks. One can clearly see that in the Epinion network the edge sign is the most associated with network features, but that correlations exist in all networks. Moreover, the correlations for each feature are different in different networks. To show that such VOTEs can indeed detect edge signs, binary machine learning was applied to the VOTEs of edges with different classification:   \begin{equation} y_{(u,v)} \in \left\{ {+,-} \right\} \end{equation} (3) Multiple learning methods were tested with $$V_{(u,v)}$$ as an input for each vertex, and $$y_{(u,v)} $$ output (blue bars in Fig. 3A, C and E). We used standard machine learning algorithms: ‘AdaBoost’ (Adaptive Boosting), ‘Random Forest’, ‘SVM-SGD’ (Stochastic Gradient Descent) and feed forward deep networks. On average over all networks and conditions, the deep learning classifier had the best performance over both train and test (last group of bars in Fig. 3 for test). The prediction of each category was estimated using the area under the receiver operator curve (ROC) curve (AUC). In order to compare our results to previous publications, we also report the maximal accuracy (i.e., accuracy with optimal threshold as computed on train set Table 1). The deep learning results were significantly better than random for all methods. In contrast, some of the boosting methods failed to produce significant results. In the following sections, we continue only with the deep results (rightmost bars in Fig. 3A, C and E). These results are computed in the low information setup, where the signs of the edges are divided into test and train compartment, with a test fraction of 0.2. Fig. 3. View largeDownload slide AUC results ($$y$$ axes in all figures) for test set using four different algorithms, with either only topological features (blue), sign propagation features (gray), combined topology and sign propagation features (black). The train fraction in the left plot is 0.8 and the test fraction is 0.2. The right plot is parallel to the left plot, and represent the same results only for the deep learning, as a function of the train fraction. The different subplots are: (A and B) Wiki, (C and D) Slashdot, (E and F) Epinions The error bars represent the standard errors using three random train/test divisions. The $$x$$ axes in B, D and F are on log scale. One can clearly see that as the training size shrinks, the quality of the propagation based methods decreases, but the topology based methods AUC stay approximately constant. Fig. 3. View largeDownload slide AUC results ($$y$$ axes in all figures) for test set using four different algorithms, with either only topological features (blue), sign propagation features (gray), combined topology and sign propagation features (black). The train fraction in the left plot is 0.8 and the test fraction is 0.2. The right plot is parallel to the left plot, and represent the same results only for the deep learning, as a function of the train fraction. The different subplots are: (A and B) Wiki, (C and D) Slashdot, (E and F) Epinions The error bars represent the standard errors using three random train/test divisions. The $$x$$ axes in B, D and F are on log scale. One can clearly see that as the training size shrinks, the quality of the propagation based methods decreases, but the topology based methods AUC stay approximately constant. Table 1 Comparison to previous state of the art classifiers maximal accuracy. The results represent the test set accuracy. The first column represent the minimal number of common neighbours between the edge source and target of the edges studied. The bolded values are the maximal value in their category. The VOTE model is applied in a partial information setup, while all others are applied in the full information setup. Still the VOTE model outperforms existing algorithms in many categories. Some of the existing algorithms require a minimal number of common neighbours. We performed the comparison to adapt to the setup of previous studies    VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208     VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208  Table 1 Comparison to previous state of the art classifiers maximal accuracy. The results represent the test set accuracy. The first column represent the minimal number of common neighbours between the edge source and target of the edges studied. The bolded values are the maximal value in their category. The VOTE model is applied in a partial information setup, while all others are applied in the full information setup. Still the VOTE model outperforms existing algorithms in many categories. Some of the existing algorithms require a minimal number of common neighbours. We performed the comparison to adapt to the setup of previous studies    VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208     VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208  The learning results clearly show that the network topology contains information about the classification. However, this information may be completely overlapping with the information available in sign propagation approaches. In order to test whether the information available in the topology can be combined with sign propagation to improve the accuracy, we have reproduced a simple information propagation algorithm, where four more dimensions were added to the classifications—the number of $$+$$ and $$-$$ signs in incoming and outgoing edges from a given vertex in the train set.   \begin{align} \begin{split} & S_{(u,v)} =\left[ {+_{(u,w)} ,+_{(w,v)} ,-_{(u,w)} ,-_{(w,v)} } \right] \\ & s.t.(u,w)\in train,(w,v)\in train \\ \end{split} \end{align} (4) Note that these features, in contrast with $$V_{(u,v)} $$ are sensitive to the train test division. Four parallel features of the input the source vertex and output of edge target could be added. However, following Leskovec et al. [2], we limit ourselves to these four features. In all networks studied, the precision obtained when using $$S_{(u,v)} $$ was higher than when using $$V_{(u,v)} $$, as expected given the sign similarity between neighbouring edges (gray vs. blue bars in Fig. 3). The combined methods (using both $$S_{(u,v)}$$ and $$V_{(u,v)} $$—black bars in Fig. 3) always outperform each one separately, in the deep learning approach, suggesting that these two types of information are complementary. The deep neural network with the combined features outperforms the latest studies [32] with an AUC of 0.902 on wiki graph and AUC of 0.97 on the Epinions graph, although we perform the analysis in the low information setup, while previous studies used the easier full information setup. Moreover, the current method is precise, even when all edges are analysed, while previous methods have focused on edges with a minimal number of common neighbours between the source and target vertices (see Table 1 for a detailed comparison). To test whether the train set size can be further reduced, we repeated the analysis with varying train set size. In the current networks studied, the accuracy of the classification based on graph topology features is practically independent of the train size, down to a 5% train size. The sign propagation features perform better when the train size is big, since we have more signed close neighbours that can be used. The combined classification follows the sign propagation (Fig. 3D–F). In order to understand which of the features contribute to edge sign prediction, we computed in the RF analysis the contribution to the Out Of Bag error. One can clearly see that per feature information propagation contributes more than the topological features (Fig. 4, four last rows) and is the most informative method, as previously shown [2]. However, information propagation is limited to much less features than topology, and the introduction of topology significantly improved the performance. Moreover, in very small training set sizes, topology by itself is as informative as information propagation. Finally, it allows for a precise prediction in low degree nodes, where information propagation is limited. Fig. 4. View largeDownload slide (A) Contribution to out of bag error for the most influential features in a random forest formalism. The features are ranked according to the sum of their contribution to the classification to all three networks. One can clearly see that different networks are affected by different features. (B) AUC as a function of the number of topological features used when only topology is used. The high AUC is only obtained when a high number of features is used. Error bars are the standard error of the AUC in three random train/test divisions. Fig. 4. View largeDownload slide (A) Contribution to out of bag error for the most influential features in a random forest formalism. The features are ranked according to the sum of their contribution to the classification to all three networks. One can clearly see that different networks are affected by different features. (B) AUC as a function of the number of topological features used when only topology is used. The high AUC is only obtained when a high number of features is used. Error bars are the standard error of the AUC in three random train/test divisions. Within the information propagation features, a clear difference was observed between the different networks. While for the Wiki the number of positive and negative edges pointing to the vertex is the most influential, for the slash.dot network, the most influential feature is the number of negative edges that the vertex points to. For the Epinion network, the main contribution is actually mainly from structural features, such as Pagerank, the closeness and the motifs 2 and 7. Finally for the Wiki, the K core is the most influential structural feature. The difference between feature contribution among networks further reinforce the need to use a large array of feature, and not to limit the analysis to specific features to classify vertices, as we and others have previously done [12, 33, 34]. The interpretation of these opposite contributions is further discussed. In order to further ensure that the topology-based classification is not the artefact of a single trivial highly correlated feature (e.g., the degree), we sorted all features based on their Spearman correlation coefficient, and only used the 5, 10 or 20 most correlated/anticorrelated features. One can clearly see (Fig. 4B) that the AUC increases with the number of feature used, suggesting a classification based on a combination of features, each contributing a limited amount of information, as expected from the out of bat error (Fig. 4A). 4. Discussion Most current edge sign prediction methods are based on local information propagation. The sign of an edge is assumed to be correlated with the sign of neighbouring edges, or with combinations of such signs. These methods fail for edges not surrounded by known signed edges, such as isolated edges, or edges in networks with limited information on sign. We here propose an alternative approach. We assume that the edge sign and the topological features of the network surrounding the edge (not only the ego-network, but the properties of the edge in the full network) are correlated with its sign. This can happen for example if both network structure and sign are affected by a common driving mechanism (e.g., friendly people tend to create positive signs, but also higher degrees). We and others have previously developed such models for vertex prediction, and termed it the Network Attribute Vector (NAV) approach [11]. In this NAV approach, relational information as represented by an interaction network is translated into real number vectors, taking advantage of the accumulated rich characterization of structural attributes in network science. For the NAV, we selected a variety of properties of local and global scale. These properties are far from being the only possible properties, and many other similar combinations are possible, with probably similar results. This concept relies on evidence that motifs and centrality measures are associated with vertices function. For example, a small set of motifs is abundant in genes regulation, subgraph centrality and total communicability can be used to identify prominent proteins and hierarchy energy and flow measures enable differentiating between specific neuronal populations by their position along the signal propagation. This led to the conjecture that underlies the NAV algorithm—specific subsets of vertices have typical topological profile. Given the relation between NAV and the properties of vertices (e.g., sub-cellular location), we here extend the concept to edges, and propose a model where edge signs can be learnt based on the parallel of NAVs for edges—VOTEs. This required the development of topological features for edges. Such features can be based for example on a combination of the features of the vertices surrounding the edge. Another alternative is the development of edge specific measures, such as the frequency of sub-graphs containing a specific edge. As was the case for vertices, the current set of features studied is far from being comprehensive, and other parallel features can be developed. We term the feature vector for edges the VOTE. The information available in the VOTE is complementary to the information that can be obtained through information propagation, and combined methods with both information propagation and VOTE outperform current state of the art methods for edge sign prediction. Thus, the results here are not in contrast with theories relating the signs of neighbouring edges, they add another layer to the prediction of edge signs. The high number of features used limits the applicability of such methods in classical machine learning methods. However, modern neural network topologies that can prevent over-fitting in large dimension through regularization allow for the combination of such large feature numbers, even in networks with a few thousand edges. Different networks have contributions to the prediction from very different features. In the Wiki dataset, people judge whether another person should be an administrator. Such decisions are often made based on the reputation of the requester remarks. Moreover, there is no feedback in this relation. Thus, as expected, the main element determining the sign are the input edges to the target vertex (Fig. 3). A more interesting aspect is the importance of centrality measures, such as the average distance to other vertices and the K core. It seems that people are not only judged by the votes of others, but also by their centrality in the network. The importance of centrality has been previously suggested in multiple other contexts [35, 36]. Note that the centrality of people may be actually the features votes seek. The Slashdot network differs from the Wiki. In this network, the striking effect is the person judging, and not the comments judged. In other words, the judgment whether a person is trustworthy is mainly determined by the character of the judge, and less by the comments itself. This is in excellent agreement with the bounded rationality theories [37], arguing that our judgments on statements are strongly affected by our attitude and in contrast with social theories focusing on interactions [2]. The Epinion graph is the most fascinating, since it is actually mainly affected by the topology. This can be clearly seen in Fig. 3 in the high accuracy obtained in the purely topological learning, and in Fig. 4 in the importance of topological features. Moreover, it is strongly affected by global and not local features, showing that we do not judge other people reviews of products by how other people review them, but rather by how central or expert are the people. The results presented here only show that topological features are correlated with edges signs, and that different networks have different mechanisms of sign determination. An important caveat of the current results is that they only suggest, but do not prove a causal relation between the network topology and the signs. The direct estimate of such a relation is complicated by the correlation between different topological features, and by feedback between structure and content [8, 9]. More complex methods, such as partial correlation [38], or Granger causality [39] will be required to correlate topology and sign in signed networks with multiple snapshots in time. Funding The work of KC and RN was funded by a ministry of defense grant through RAFAEL. References 1. Tang J., Chang Y., Aggarwal C. & Liu H. ( 2016) A Survey of Signed Network Mining in Social Media. ACM Computing Surveys (CSUR) Surveys Homepage archive , 49, Article No. 42. 2. Leskovec J., Huttenlocher D. & Kleinberg J. ( 2010) Predicting Positive and Negative Links in Online Social Networks. Proceedings of the 19th international conference on World wide web 2010 (WWW’10) , pp. 641– 650. 3. Easley D. & Kleinberg J. ( 2010) Networks, Crowds, and Markets: Reasoning about a Highly Connected World . Oxford, UK: Oxford university press. Google Scholar CrossRef Search ADS   4. Newman M. ( 2010) Networks?: An introduction . Oxford, UK: Oxford University Press. Google Scholar CrossRef Search ADS   5. Cartwright D. & Harary F. ( 1956) Structural balance: a generalization of Heider’s theory. Psychol. Rev. , 63, 277– 293. Google Scholar CrossRef Search ADS PubMed  6. Heider F. ( 1946) Attitudes and cognitive organization. J. Psychol. , 21, 107– 112. Google Scholar CrossRef Search ADS PubMed  7. Brot H., Muchnik L., Goldenberg J. & Louzoun Y. ( 2012) Feedback between node and network dynamics can produce real-world network properties. Phys. A , 391, 6645– 6654. Google Scholar CrossRef Search ADS   8. Brot H., Muchnik L., Goldenberg J. & Louzoun Y. ( 2016) Evolution through bursts: network structure develops through localized bursts in time and space. Netw. Sci. , 4, 293– 313. Google Scholar CrossRef Search ADS   9. Henderson K., Gallagher B., Li L., Akoglu L., Eliassi-Rad T., Tong H. & Faloutsos C. ( 2011) It’s Who You Know: Graph Mining Using Recursive Structural Features. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’11) , pp. 663– 671. 10. Rosen Y. & Louzoun Y. ( 2016) Topological similarity as a proxy to content similarity. J. Complex Networks , 4, 38– 60. Google Scholar CrossRef Search ADS   11. Muchnik L., Itzhack R., Solomon S. & Louzoun Y. ( 2007) Self-emergence of knowledge trees: extraction of the Wikipedia hierarchies. Phys. Rev. E , 76, 16106. Google Scholar CrossRef Search ADS   12. Itzhack R., Muchnik L., Erez T., Tsaban L., Goldenberg J., Solomon S. & Louzoun Y. ( 2010) Empirical extraction of mechanisms underlying real world network generation. Phys. A Stat. Mech. Appl. , 389, 5308– 5318. Google Scholar CrossRef Search ADS   13. Sabidussi G. ( 1966) The centrality index of a graph. Psychometrika , 31, 581– 603. Google Scholar CrossRef Search ADS PubMed  14. Razaghi Z., Kashani M., Ahrabian H., Elahi E., Nowzari-Dalini A., Saberi Ansari E., Asadi S., Mohammadi S., Schreiber F. & Masoudi-Nejad A. ( 2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics , 10, 318. Google Scholar CrossRef Search ADS PubMed  15. Batagelj V. & Zaversnik M. ( 2003) An O(m) Algorithm for Cores Decomposition of Networks, arXiv:cs/0310049. 16. Page L., Brin S., Motwani R. & Winograd T. ( 1998) The pagerank citation ranking: Bringing order to the web . Technical report, Stanford Digital Library Technologies Project. 17. Grover A. & Leskovec J. ( 2016) node2vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16) , pp. 855– 864. 18. Guha R., Kumar R., Raghavan P. & Tomkins A. ( 2004) Propagation of Trust and Distrust. Proceedings of the 13th international conference on World Wide Web (WWW’04) , pp. 403– 412. 19. Breiman L. ( 2001) Random Forests. Mach. Learning , 45, 5– 32. Google Scholar CrossRef Search ADS   20. Zhu J., Zou H., Rosset S. & Hastie T. ( 2009) Multi-class AdaBoost *. Stat. Interface , 2, 349– 360. Google Scholar CrossRef Search ADS   21. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D. & Alon U. ( 2002) Network motifs: simple building blocks of complex networks. Science , 298, 824– 827. Google Scholar CrossRef Search ADS PubMed  22. Itzhack R., Mogilevski Y. & Louzoun Y. ( 2007) An optimal algorithm for counting network motifs. Phys. A Stat. Mech. Appl. , 381, 482– 490. Google Scholar CrossRef Search ADS   23. Wasserman S. & Faust K. ( 1994) Social Network Analysis: Methods and Applications . Cambridge?, New York: Cambridge University Press. Google Scholar CrossRef Search ADS   24. Barabási A.-L. & Albert R. ( 1999) Emergence of scaling in random networks. Science , 286, 509– 512. Google Scholar CrossRef Search ADS PubMed  25. Schapire R. E. & Singer Y. ( 1999) Improved Boosting Algorithms Using Confidence-rated Predictions Mach. Learning , 37, 297– 336. 26. Liaw A. & Wiener M. ( 2002) Classification and regression by randomForest. R News , 2/3, 18– 22. 27. Suykens J. & Vandewalle J. ( 1999) Least squares support vector machine classifiers. Neural Process. Lett. , 9, 293– 300. Google Scholar CrossRef Search ADS   28. Deng L. & Yu D. ( 2014) Deep Learning: Methods and Applications. Foundations and trends in signal processing , 7, 197– 387. Google Scholar CrossRef Search ADS   29. Schmidhuber J. ( 2015) Deep Learning in Neural Networks: An Overview. Neural Networks , 61, 85– 117. Google Scholar CrossRef Search ADS PubMed  30. LeCun Y., Bengio Y. & Hinton G. ( 2015) Deep learning. Nature , 521, 436– 444. Google Scholar CrossRef Search ADS PubMed  31. Arel I., Rose D. C. & Karnowski T. P. ( 2010) Deep Machine Learning–-A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine , 5, 13– 18. Google Scholar CrossRef Search ADS   32. Khodadadi A. & Jalili M. ( 2017) Sign prediction in social networks based on tendency rate of equivalent micro-structures. Neurocomputing , 257, 175– 184. Google Scholar CrossRef Search ADS   33. Rosen Y. & Louzoun Y. ( 2014) Directionality of real world networks as predicted by path length in directed and undirected graphs. Phys. A Stat. Mech. Appl. , 401, 118– 129. Google Scholar CrossRef Search ADS   34. Itzhack R., Tsaban L. & Louzoun Y. ( 2013) Long loops of information flow in genetic networks highlight an inherent directionality. Syst. Biomed. , 1, 47– 54. Google Scholar CrossRef Search ADS   35. Wuchty S. & Almaas E. ( 2005) Peeling the yeast protein network. Proteomics , 5, 444– 449. Google Scholar CrossRef Search ADS PubMed  36. Kumar R., Novak J. & Tomkins A. ( 2010) Structure and evolution of online social networks. Link Mining: Models, Algorithms, and Applications  ( Yu P. Han J. & Faloutsos C. eds). New York, NY: Springer, pp. 337– 357. 37. Tversky A. & Kahneman D. ( 1981) The framing of decisions and the psychology of choice. Science , 211, 453– 458. Google Scholar CrossRef Search ADS PubMed  38. Fisher R. ( 1924) The distribution of the partial correlation coefficient. Metron , 3, 329– 332. 39. Granger C. W. J. ( 1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica , 37, 424. Google Scholar CrossRef Search ADS   © The authors 2018. Published by Oxford University Press. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Complex Networks Oxford University Press

Edge sign prediction based on a combination of network structural topology and sign propagation

Loading next page...
 
/lp/ou_press/edge-sign-prediction-based-on-a-combination-of-network-structural-4l509j0kzn
Publisher
Oxford University Press
Copyright
© The authors 2018. Published by Oxford University Press. All rights reserved.
ISSN
2051-1310
eISSN
2051-1329
D.O.I.
10.1093/comnet/cny012
Publisher site
See Article on Publisher Site

Abstract

Abstract The prediction of edge signs in social and biological networks is a major goal of graph-based machine learning and has important implication in recommendation systems. Most current edge sign prediction methods rely on information propagation from neighbouring edges either directly by assuming sign similarity in neighbouring edges or using more complex theories based on combination of edge signs in neighbours. Such methods rely on a high network sampling fraction, and fail at low sampling level. We, here, show that edges with similar network topology, as defined by a combination of network measures have similar signs. The surprising correlation between network topology and edge sign can be used for prediction. Indeed, machine learning algorithm based on this topology can produce a higher accuracy than state of the art methods in standard datasets, even when a very small fraction of the edge signs are known, with an accuracy of up to 93%. We further show that different datasets differ in the importance of different features. A combination of features is always required to obtain a high area under the curve. When the vertices represent people, the sign is mainly affected by the edge target. When the network represents opinions, the signs are mainly affected by the edge source. The proposed method can be applied to directed and undirected, weighted and unweighted networks. 1. Introduction Edges in social and biological networks can have different signs, representing among many other possibilities people with a positive or negative attitude to other people or proteins/neurons that activate or inhibit other proteins/neurons. Edge sign prediction has attracted much interest, mainly in social networks [1]. Sign prediction has multiple real-world applications, such as testing the legitimacy of connection between people and detecting edges that have a predictive power for the properties of vertices. The edge sign prediction problem can be defined as follows: assume a network $$G = (V,E)$$, where $$V$$ are vertices and $$E$$ are edges, with signs on all edges, but only a limited fraction of the signs are known. The edge sign prediction problem is to reliably infer the sign of an edge with a hidden sign, using $$V,E$$ and the known signs. Most edge sign predictions are based on the assumption that neighbouring edges have similar signs, and thus edge sign can be estimated based on sign propagation algorithms [2]. Similar principles of homophily and peer influence have been applied to vertex colour/sign prediction [3, 4]. Note that while it may be hard to distinguish between homophily and peer influence, such an identification is not required for prediction. Sign propagation models are based on two main social theories: ‘balance’ and ‘status’. The ‘balance’ theory represents the assumptions that: ‘the friend of my friend is my friend’ (i.e., ($$u,v)$$ and ($$v,w)$$ positive edges increase the probability of a ($$u,w)$$ positive edge), ‘the enemy of friend is my enemy’ and similar combinations [5]. The ‘status’ theory represents the assumption that one refers in a positive way to a person that has higher status than him, and in a negative way to person with lower status than him [6]. A crucial limitation of such theories for machine learning application is the requirement of a large enough number of edges with known signs surrounding the edge for which the sign is predicted. Formally, assume a graph with positive or negative sign on each edge. Our goal is to predict the sign of edges with unknown signs, using a limited fraction of signed edges (e.g., 20% of known signs, with the target of predicting the other 80%), denoted here as the partial information setup. Such a setup often occurs when ties are known, but for the observed tie, the sign is unknown for a large fraction of the edges. Note that this is different than most sign-prediction setups, where one assumes that the sign of all edges but one are known, and one attempts to predict the unknown edge sign [2, 7], denoted here as a full information setup. The full information setup is obviously easier to solve, since for high average degree network, the vast majority of edges have neighbours with known signs. We, here, propose that a different principle may be used to predict edge sign even in sparsely sampled networks. We have recently proposed based on theoretical and experimental observations that the dynamics affecting edge and node signs is coupled with the edge addition and deletion dynamics [8, 9]. This coupling suggests that the sign of an edge and the topology of the graph around it may be also coupled, since the graph topology is determined by the edge addition and removal dynamics. Such a coupling would emerge through two possible mechanism. Either the sign of ($$u,v)$$ can affect the network dynamics of vertices $$u$$ and $$v$$, and these dynamics are mirrored in the network attributes of $$u$$ and $$v$$, and the relation between them, or the sign of an edge is determined by the network structure around the edge. In both cases, edges with similar topological features may have similar signs. For example, if feedback circles encourage positive reactions, then the number of coherent circles (u> v-> w-> u) may be correlated with a positive edge sign. In such a case, this feature may be used to predict the edge sign. Over the past decade, multiple studies by ourselves and others have proposed correlations between vertex/edge properties and graph structure [10–13]. Herein, we show that the same concept holds for edge signs, but that this relation is highly network dependent. Moreover, the correlation between the graph topology and the edge signs is strong enough to classify edges even in a low information setup, and when combined with very simple information propagation, it leads to better accuracies than the ones obtained in the full information setup for standard test problems. Using the current approach, even in networks, where the signs are scarcely sampled, the remaining signs can be predicted with an accuracy ranging between 82% and 93% in multiple real-world networks. The difference between the current approach and sign propagation approaches can be seen in Fig. 1. In Box 1, we can assume by sign propagation that the unknown edge will be probably positive. However, in Box 2, prediction of the unknown sign only by sign propagation will be inaccurate, since both input and output have low degrees, and mixed edge signs. This becomes even more complicated if only a small fraction of edges will have known signs. Fig. 1. View largeDownload slide (A) Box 1—In this rectangle, edge signs are best computed by sign propagation. Box 2—here, sign propagation would prove difficult to use. Boxes 3 and 4 have different signs but similar topology and may be used to infer from one to the other. (B) Thirteen different non-isomorphic directed graphs with three vertices are used in the current analysis. For each edge, we count how many times it is located in each of these motifs. The numbers in Fig. 3 correspond to the parallel numbers here. (C) For each vertex feature, we take the mean and the subtraction of the features of the target and source vertices to produce edge features. The subtraction is set to be source vertex minus target vertex divided by two, with no loss of generality. Fig. 1. View largeDownload slide (A) Box 1—In this rectangle, edge signs are best computed by sign propagation. Box 2—here, sign propagation would prove difficult to use. Boxes 3 and 4 have different signs but similar topology and may be used to infer from one to the other. (B) Thirteen different non-isomorphic directed graphs with three vertices are used in the current analysis. For each edge, we count how many times it is located in each of these motifs. The numbers in Fig. 3 correspond to the parallel numbers here. (C) For each vertex feature, we take the mean and the subtraction of the features of the target and source vertices to produce edge features. The subtraction is set to be source vertex minus target vertex divided by two, with no loss of generality. However, one may instead use network topologies, and classify the vertices based on the topology of the network surrounding them. If edges with similar topologies have typically similar signs, we can infer the sign of the current edge. Different topological elements can be used to classify edges, such as small scale motifs (Fig. 1B), degree, centrality, clustering and others network features [14–17]. It is important to note that we do not use the sign of the edges in these features, so that the features can be computed on all edges, even if the majority of those have hidden signs. Similar approaches have been developed in order to classify the colour of vertices (see for example among many others [10, 11, 18]). We now apply the same principles to edge sign prediction. 2. Methods 2.1 Algorithm description The sign prediction algorithm consists of three steps, as further detailed below: Each edge is associated with a vector of topological edge properties (VOTE), such as centrality measures, degrees and small scale motifs frequencies. Each dimension (attribute) is normalized to produce $$z$$ scores. Neural networks and other machine learning approaches are applied on the VOTEs. The algorithms used here were among others: Random Forest [19], AdaBoost [20], support vector machine (SVM) and feed forward neural networks. However, any other classifier can be used. 2.2 Vector of topological edge properties We define for each edge in the network a VOTE. This VOTE is a set of discrete/continuous values representing a large set of local and global network measures. Note that the VOTE of an edge is not affected by its classification. The following measures were used for each VOTE: In-degree, out-degree. The number of vertices that are connected to a vertex via an edge. The value for each edge is computed by taking the mean and the subtraction of the vertices surrounding the edge. Three vertices small scale motifs [21]: all possible directed connected sub-graphs with three vertices containing the current edge (taking into consideration isomorphisms) were counted. The well-known clustering coefficient measure was omitted due to its redundancy, since triplets and triangles are already counted in size three motifs [22]. For each motif, we increment the count of all the edges that participate in the motif by one. Each edge has a vector of its contribution to 13 possible motifs (Fig. 1B). Shortest path measures: two measures of different aspects of the participation of a vertex in shortest paths of the network are Closeness centrality [23], as well as the first and second moments of the distance distribution as estimated by the breadth-first search (BFS) algorithm. In order to translate vertex attributes to edge attributes in the VOTEs we perform a subtraction and average of the vertices attributes (Fig. 1C). Extended walks measures: a measure that implements ranking by walks on the network beyond the shortest paths is Pagerank [17]. The Pagerank score expresses the probability of reaching a vertex by a random walk (random surfer model). 2.3 Z scoring The VOTEs contain a variety of measures that can be discrete or continuous and bounded or unbounded (up to very large number limited by a function of the total vertex or edge number). Measures of networks, such as degrees and motif frequencies, are often heavy tailed [24]. We used their log to avoid extreme values. Attributes with either positive or zero values were assigned a minimal value in the case of 0, before a 10-base log was applied (0.01 in the current realization). All measures were then $$Z$$ scored to a 0 mean and a unit standard deviation, leading to the following $$z$$ score for a non-negative heavy tail variable $$x$$.   \begin{align} \begin{split} z&=\log_{10} (x+0.01) \\ z&=\frac{z-\left\langle z \right\rangle }{std(z)} \\ \end{split} \end{align} (1) 2.4 Classification of edges based on their topological properties The VOTE method can be implemented using any binary classifier. Herein, we tested it with several of the most common algorithms, all implemented using the Scikit-Learn python package. In addition, we applied deep learning using the Keras package feed forward neural networks, with a Theano backend. The percentage of train and test samples was 80% and 20%, respectively, unless explicitly stated otherwise. The following classifiers were used: AdaBoost classifier [25]: the maximum number of estimators at which boosting is terminated was set to 100. In case of a perfect fit, the learning procedure was stopped. Random Forest (RF) classifier [26]: the number of trees in the forest was set to 1000. The splitting criterion was defined to be the Gini index. The maximum depth of the tree was set to 3. The minimum number of samples required to split an internal vertex was 15. The class weight was defined to be ‘balanced’, which uses the class frequency to automatically adjust weights inversely proportional to class frequencies. SVM with stochastic gradient descent (SGD): we performed simple SGD with loss function and regularization as in the SVM [27] loss function. Our regularization was also standard as SVM regularization. Deep learning: the network used was composed of three layers. We used the ReLU activation function, 0.4 rate regularization of l1_l2 and a dropout rate of 0.4 rate for the internal layers. The output layer activation function was a sigmoid. The loss function was binary cross-entropy. The network was fully connected [28–31]. All other parameters were the Keras default parameters. The optimizer was the ‘ADAM’ stochastic gradient. 2.5 Statistical analysis The precisions of the classifiers were based on the AUC (area under the curve). The AUC is a common evaluation metric for binary classification problems, which is based on the relation between the false positive rate and the true positive rate. For the comparison with previous results max. accuracy was used for consistence. 3. Results In order to show that topological features, such as the clustering coefficient, degree, small scale motif distribution and centrality are correlated with edges sign, and can be used to predict the sign of edges, we analysed multiple standard networks, where state of the art prediction algorithms were tested for edge prediction: Epinions—is a reviews Website, where the users can trust ($$+)$$ or distrust ($$-)$$ each other’s review. This dataset has 131 828 vertices and 841 372 edges. Signs in Epinions represent the attitude of the reviewer, and the quality of the review. Slashdot—is a tech blog where bloggers can like ($$+)$$ or dislike ($$-)$$ other blogger comments. This dataset has 82 144 vertices and 549 202 edges. Wikipedia—the request for admin votes, where a positive link represent a vote for a user and negative links represent votes against. Some links have neutral signs, representing neutral votes. This dataset has 10 835 vertices and 159 388 edges. We did not include edges with neutral signs in the prediction. For each of these networks, we computed for each edge a VOTE, using a combination of edge specific properties and vertex specific properties, combined into edge properties through the subtraction of the source and the mean of the two vertices that are connected by the edge (Fig. 1C). Formally, assume an edge $$(u,v)$$ from vertex $$u$$ to vertex $$v$$, with a set of edge attributes $$t_{(u,v)} $$, and vertex attributes for each surrounding vertex $$m_{u} ,m_{v} $$. We define the VOTE of the edge $$(u,v)$$ to be:   \begin{equation} V_{(u,v)} =\left[ {t_{(u,v)} ,\frac{m_{u} +m_{v} }{2},m_{v} -m_{u} } \right], \end{equation} (2) where all elements are row vectors. We chose the average and difference, instead of the features of each vertex by itself to produce an edge-centric approach representing for example the average degree of an edge as the average number of edges pointing to any of the vertices surrounding the edge. The second element (the difference) incorporate the directionality by representing the difference between the source and target vertices of the edge. Note that other linear combinations could have been used, but since we are using a neural network approach, even if the current combination is not optimal, we expect the network weights to produce an optimal linear combinations based on the current representation. In the current realization, the VOTE vector has 31 graph-topology based features: $$2 \times 2$$ -In & out degree, 13-motifs, $$5 \times 2 $$ Centrality measures and $$2 \times 2$$ Community measures. It is easy to see that the topology features can be computed for each edge in the graph regardless of the sign of the edge and more importantly regardless the sign of its neighbouring edges in contrast with methods based on the neighbours class. To show that VOTEs are indeed correlated with the sign of edges, the Spearman correlation of each feature and the edge sign was computed. The correlations vary over a wide range, but some features, such as specific small scale motifs reach a correlation of $$-$$0.6 in the Epinions network. In other networks, the correlations are lower, but still very significant (Fig. 2). Fig. 2. View largeDownload slide Spearman correlations between features and the plus sign in the different networks studied. The features are ranked from bottom to top according to the sum of their correlation absolute values in all three networks. One can clearly see that in the Epinion network the edge sign is the most associated with network features, but that correlations exist in all networks. Moreover, the correlations for each feature are different in different networks. Fig. 2. View largeDownload slide Spearman correlations between features and the plus sign in the different networks studied. The features are ranked from bottom to top according to the sum of their correlation absolute values in all three networks. One can clearly see that in the Epinion network the edge sign is the most associated with network features, but that correlations exist in all networks. Moreover, the correlations for each feature are different in different networks. To show that such VOTEs can indeed detect edge signs, binary machine learning was applied to the VOTEs of edges with different classification:   \begin{equation} y_{(u,v)} \in \left\{ {+,-} \right\} \end{equation} (3) Multiple learning methods were tested with $$V_{(u,v)}$$ as an input for each vertex, and $$y_{(u,v)} $$ output (blue bars in Fig. 3A, C and E). We used standard machine learning algorithms: ‘AdaBoost’ (Adaptive Boosting), ‘Random Forest’, ‘SVM-SGD’ (Stochastic Gradient Descent) and feed forward deep networks. On average over all networks and conditions, the deep learning classifier had the best performance over both train and test (last group of bars in Fig. 3 for test). The prediction of each category was estimated using the area under the receiver operator curve (ROC) curve (AUC). In order to compare our results to previous publications, we also report the maximal accuracy (i.e., accuracy with optimal threshold as computed on train set Table 1). The deep learning results were significantly better than random for all methods. In contrast, some of the boosting methods failed to produce significant results. In the following sections, we continue only with the deep results (rightmost bars in Fig. 3A, C and E). These results are computed in the low information setup, where the signs of the edges are divided into test and train compartment, with a test fraction of 0.2. Fig. 3. View largeDownload slide AUC results ($$y$$ axes in all figures) for test set using four different algorithms, with either only topological features (blue), sign propagation features (gray), combined topology and sign propagation features (black). The train fraction in the left plot is 0.8 and the test fraction is 0.2. The right plot is parallel to the left plot, and represent the same results only for the deep learning, as a function of the train fraction. The different subplots are: (A and B) Wiki, (C and D) Slashdot, (E and F) Epinions The error bars represent the standard errors using three random train/test divisions. The $$x$$ axes in B, D and F are on log scale. One can clearly see that as the training size shrinks, the quality of the propagation based methods decreases, but the topology based methods AUC stay approximately constant. Fig. 3. View largeDownload slide AUC results ($$y$$ axes in all figures) for test set using four different algorithms, with either only topological features (blue), sign propagation features (gray), combined topology and sign propagation features (black). The train fraction in the left plot is 0.8 and the test fraction is 0.2. The right plot is parallel to the left plot, and represent the same results only for the deep learning, as a function of the train fraction. The different subplots are: (A and B) Wiki, (C and D) Slashdot, (E and F) Epinions The error bars represent the standard errors using three random train/test divisions. The $$x$$ axes in B, D and F are on log scale. One can clearly see that as the training size shrinks, the quality of the propagation based methods decreases, but the topology based methods AUC stay approximately constant. Table 1 Comparison to previous state of the art classifiers maximal accuracy. The results represent the test set accuracy. The first column represent the minimal number of common neighbours between the edge source and target of the edges studied. The bolded values are the maximal value in their category. The VOTE model is applied in a partial information setup, while all others are applied in the full information setup. Still the VOTE model outperforms existing algorithms in many categories. Some of the existing algorithms require a minimal number of common neighbours. We performed the comparison to adapt to the setup of previous studies    VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208     VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208  Table 1 Comparison to previous state of the art classifiers maximal accuracy. The results represent the test set accuracy. The first column represent the minimal number of common neighbours between the edge source and target of the edges studied. The bolded values are the maximal value in their category. The VOTE model is applied in a partial information setup, while all others are applied in the full information setup. Still the VOTE model outperforms existing algorithms in many categories. Some of the existing algorithms require a minimal number of common neighbours. We performed the comparison to adapt to the setup of previous studies    VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208     VOTE  Leskovec (2010)  Jalili (2017)  Min(Em)  Wiki  0  0.82  0.8021    1  0.825    0.7927  10  0.812  0.8  0.7508  Min(Em)  Slashdot  0  0.836  0.8    1  0.85    0.8217  10  0.892  0.89  0.8932  Min(Em)  Epinions  0  0.9  0.865    1  0.926    0.9083  10  0.924  0.93  0.9208  The learning results clearly show that the network topology contains information about the classification. However, this information may be completely overlapping with the information available in sign propagation approaches. In order to test whether the information available in the topology can be combined with sign propagation to improve the accuracy, we have reproduced a simple information propagation algorithm, where four more dimensions were added to the classifications—the number of $$+$$ and $$-$$ signs in incoming and outgoing edges from a given vertex in the train set.   \begin{align} \begin{split} & S_{(u,v)} =\left[ {+_{(u,w)} ,+_{(w,v)} ,-_{(u,w)} ,-_{(w,v)} } \right] \\ & s.t.(u,w)\in train,(w,v)\in train \\ \end{split} \end{align} (4) Note that these features, in contrast with $$V_{(u,v)} $$ are sensitive to the train test division. Four parallel features of the input the source vertex and output of edge target could be added. However, following Leskovec et al. [2], we limit ourselves to these four features. In all networks studied, the precision obtained when using $$S_{(u,v)} $$ was higher than when using $$V_{(u,v)} $$, as expected given the sign similarity between neighbouring edges (gray vs. blue bars in Fig. 3). The combined methods (using both $$S_{(u,v)}$$ and $$V_{(u,v)} $$—black bars in Fig. 3) always outperform each one separately, in the deep learning approach, suggesting that these two types of information are complementary. The deep neural network with the combined features outperforms the latest studies [32] with an AUC of 0.902 on wiki graph and AUC of 0.97 on the Epinions graph, although we perform the analysis in the low information setup, while previous studies used the easier full information setup. Moreover, the current method is precise, even when all edges are analysed, while previous methods have focused on edges with a minimal number of common neighbours between the source and target vertices (see Table 1 for a detailed comparison). To test whether the train set size can be further reduced, we repeated the analysis with varying train set size. In the current networks studied, the accuracy of the classification based on graph topology features is practically independent of the train size, down to a 5% train size. The sign propagation features perform better when the train size is big, since we have more signed close neighbours that can be used. The combined classification follows the sign propagation (Fig. 3D–F). In order to understand which of the features contribute to edge sign prediction, we computed in the RF analysis the contribution to the Out Of Bag error. One can clearly see that per feature information propagation contributes more than the topological features (Fig. 4, four last rows) and is the most informative method, as previously shown [2]. However, information propagation is limited to much less features than topology, and the introduction of topology significantly improved the performance. Moreover, in very small training set sizes, topology by itself is as informative as information propagation. Finally, it allows for a precise prediction in low degree nodes, where information propagation is limited. Fig. 4. View largeDownload slide (A) Contribution to out of bag error for the most influential features in a random forest formalism. The features are ranked according to the sum of their contribution to the classification to all three networks. One can clearly see that different networks are affected by different features. (B) AUC as a function of the number of topological features used when only topology is used. The high AUC is only obtained when a high number of features is used. Error bars are the standard error of the AUC in three random train/test divisions. Fig. 4. View largeDownload slide (A) Contribution to out of bag error for the most influential features in a random forest formalism. The features are ranked according to the sum of their contribution to the classification to all three networks. One can clearly see that different networks are affected by different features. (B) AUC as a function of the number of topological features used when only topology is used. The high AUC is only obtained when a high number of features is used. Error bars are the standard error of the AUC in three random train/test divisions. Within the information propagation features, a clear difference was observed between the different networks. While for the Wiki the number of positive and negative edges pointing to the vertex is the most influential, for the slash.dot network, the most influential feature is the number of negative edges that the vertex points to. For the Epinion network, the main contribution is actually mainly from structural features, such as Pagerank, the closeness and the motifs 2 and 7. Finally for the Wiki, the K core is the most influential structural feature. The difference between feature contribution among networks further reinforce the need to use a large array of feature, and not to limit the analysis to specific features to classify vertices, as we and others have previously done [12, 33, 34]. The interpretation of these opposite contributions is further discussed. In order to further ensure that the topology-based classification is not the artefact of a single trivial highly correlated feature (e.g., the degree), we sorted all features based on their Spearman correlation coefficient, and only used the 5, 10 or 20 most correlated/anticorrelated features. One can clearly see (Fig. 4B) that the AUC increases with the number of feature used, suggesting a classification based on a combination of features, each contributing a limited amount of information, as expected from the out of bat error (Fig. 4A). 4. Discussion Most current edge sign prediction methods are based on local information propagation. The sign of an edge is assumed to be correlated with the sign of neighbouring edges, or with combinations of such signs. These methods fail for edges not surrounded by known signed edges, such as isolated edges, or edges in networks with limited information on sign. We here propose an alternative approach. We assume that the edge sign and the topological features of the network surrounding the edge (not only the ego-network, but the properties of the edge in the full network) are correlated with its sign. This can happen for example if both network structure and sign are affected by a common driving mechanism (e.g., friendly people tend to create positive signs, but also higher degrees). We and others have previously developed such models for vertex prediction, and termed it the Network Attribute Vector (NAV) approach [11]. In this NAV approach, relational information as represented by an interaction network is translated into real number vectors, taking advantage of the accumulated rich characterization of structural attributes in network science. For the NAV, we selected a variety of properties of local and global scale. These properties are far from being the only possible properties, and many other similar combinations are possible, with probably similar results. This concept relies on evidence that motifs and centrality measures are associated with vertices function. For example, a small set of motifs is abundant in genes regulation, subgraph centrality and total communicability can be used to identify prominent proteins and hierarchy energy and flow measures enable differentiating between specific neuronal populations by their position along the signal propagation. This led to the conjecture that underlies the NAV algorithm—specific subsets of vertices have typical topological profile. Given the relation between NAV and the properties of vertices (e.g., sub-cellular location), we here extend the concept to edges, and propose a model where edge signs can be learnt based on the parallel of NAVs for edges—VOTEs. This required the development of topological features for edges. Such features can be based for example on a combination of the features of the vertices surrounding the edge. Another alternative is the development of edge specific measures, such as the frequency of sub-graphs containing a specific edge. As was the case for vertices, the current set of features studied is far from being comprehensive, and other parallel features can be developed. We term the feature vector for edges the VOTE. The information available in the VOTE is complementary to the information that can be obtained through information propagation, and combined methods with both information propagation and VOTE outperform current state of the art methods for edge sign prediction. Thus, the results here are not in contrast with theories relating the signs of neighbouring edges, they add another layer to the prediction of edge signs. The high number of features used limits the applicability of such methods in classical machine learning methods. However, modern neural network topologies that can prevent over-fitting in large dimension through regularization allow for the combination of such large feature numbers, even in networks with a few thousand edges. Different networks have contributions to the prediction from very different features. In the Wiki dataset, people judge whether another person should be an administrator. Such decisions are often made based on the reputation of the requester remarks. Moreover, there is no feedback in this relation. Thus, as expected, the main element determining the sign are the input edges to the target vertex (Fig. 3). A more interesting aspect is the importance of centrality measures, such as the average distance to other vertices and the K core. It seems that people are not only judged by the votes of others, but also by their centrality in the network. The importance of centrality has been previously suggested in multiple other contexts [35, 36]. Note that the centrality of people may be actually the features votes seek. The Slashdot network differs from the Wiki. In this network, the striking effect is the person judging, and not the comments judged. In other words, the judgment whether a person is trustworthy is mainly determined by the character of the judge, and less by the comments itself. This is in excellent agreement with the bounded rationality theories [37], arguing that our judgments on statements are strongly affected by our attitude and in contrast with social theories focusing on interactions [2]. The Epinion graph is the most fascinating, since it is actually mainly affected by the topology. This can be clearly seen in Fig. 3 in the high accuracy obtained in the purely topological learning, and in Fig. 4 in the importance of topological features. Moreover, it is strongly affected by global and not local features, showing that we do not judge other people reviews of products by how other people review them, but rather by how central or expert are the people. The results presented here only show that topological features are correlated with edges signs, and that different networks have different mechanisms of sign determination. An important caveat of the current results is that they only suggest, but do not prove a causal relation between the network topology and the signs. The direct estimate of such a relation is complicated by the correlation between different topological features, and by feedback between structure and content [8, 9]. More complex methods, such as partial correlation [38], or Granger causality [39] will be required to correlate topology and sign in signed networks with multiple snapshots in time. Funding The work of KC and RN was funded by a ministry of defense grant through RAFAEL. References 1. Tang J., Chang Y., Aggarwal C. & Liu H. ( 2016) A Survey of Signed Network Mining in Social Media. ACM Computing Surveys (CSUR) Surveys Homepage archive , 49, Article No. 42. 2. Leskovec J., Huttenlocher D. & Kleinberg J. ( 2010) Predicting Positive and Negative Links in Online Social Networks. Proceedings of the 19th international conference on World wide web 2010 (WWW’10) , pp. 641– 650. 3. Easley D. & Kleinberg J. ( 2010) Networks, Crowds, and Markets: Reasoning about a Highly Connected World . Oxford, UK: Oxford university press. Google Scholar CrossRef Search ADS   4. Newman M. ( 2010) Networks?: An introduction . Oxford, UK: Oxford University Press. Google Scholar CrossRef Search ADS   5. Cartwright D. & Harary F. ( 1956) Structural balance: a generalization of Heider’s theory. Psychol. Rev. , 63, 277– 293. Google Scholar CrossRef Search ADS PubMed  6. Heider F. ( 1946) Attitudes and cognitive organization. J. Psychol. , 21, 107– 112. Google Scholar CrossRef Search ADS PubMed  7. Brot H., Muchnik L., Goldenberg J. & Louzoun Y. ( 2012) Feedback between node and network dynamics can produce real-world network properties. Phys. A , 391, 6645– 6654. Google Scholar CrossRef Search ADS   8. Brot H., Muchnik L., Goldenberg J. & Louzoun Y. ( 2016) Evolution through bursts: network structure develops through localized bursts in time and space. Netw. Sci. , 4, 293– 313. Google Scholar CrossRef Search ADS   9. Henderson K., Gallagher B., Li L., Akoglu L., Eliassi-Rad T., Tong H. & Faloutsos C. ( 2011) It’s Who You Know: Graph Mining Using Recursive Structural Features. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’11) , pp. 663– 671. 10. Rosen Y. & Louzoun Y. ( 2016) Topological similarity as a proxy to content similarity. J. Complex Networks , 4, 38– 60. Google Scholar CrossRef Search ADS   11. Muchnik L., Itzhack R., Solomon S. & Louzoun Y. ( 2007) Self-emergence of knowledge trees: extraction of the Wikipedia hierarchies. Phys. Rev. E , 76, 16106. Google Scholar CrossRef Search ADS   12. Itzhack R., Muchnik L., Erez T., Tsaban L., Goldenberg J., Solomon S. & Louzoun Y. ( 2010) Empirical extraction of mechanisms underlying real world network generation. Phys. A Stat. Mech. Appl. , 389, 5308– 5318. Google Scholar CrossRef Search ADS   13. Sabidussi G. ( 1966) The centrality index of a graph. Psychometrika , 31, 581– 603. Google Scholar CrossRef Search ADS PubMed  14. Razaghi Z., Kashani M., Ahrabian H., Elahi E., Nowzari-Dalini A., Saberi Ansari E., Asadi S., Mohammadi S., Schreiber F. & Masoudi-Nejad A. ( 2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics , 10, 318. Google Scholar CrossRef Search ADS PubMed  15. Batagelj V. & Zaversnik M. ( 2003) An O(m) Algorithm for Cores Decomposition of Networks, arXiv:cs/0310049. 16. Page L., Brin S., Motwani R. & Winograd T. ( 1998) The pagerank citation ranking: Bringing order to the web . Technical report, Stanford Digital Library Technologies Project. 17. Grover A. & Leskovec J. ( 2016) node2vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16) , pp. 855– 864. 18. Guha R., Kumar R., Raghavan P. & Tomkins A. ( 2004) Propagation of Trust and Distrust. Proceedings of the 13th international conference on World Wide Web (WWW’04) , pp. 403– 412. 19. Breiman L. ( 2001) Random Forests. Mach. Learning , 45, 5– 32. Google Scholar CrossRef Search ADS   20. Zhu J., Zou H., Rosset S. & Hastie T. ( 2009) Multi-class AdaBoost *. Stat. Interface , 2, 349– 360. Google Scholar CrossRef Search ADS   21. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D. & Alon U. ( 2002) Network motifs: simple building blocks of complex networks. Science , 298, 824– 827. Google Scholar CrossRef Search ADS PubMed  22. Itzhack R., Mogilevski Y. & Louzoun Y. ( 2007) An optimal algorithm for counting network motifs. Phys. A Stat. Mech. Appl. , 381, 482– 490. Google Scholar CrossRef Search ADS   23. Wasserman S. & Faust K. ( 1994) Social Network Analysis: Methods and Applications . Cambridge?, New York: Cambridge University Press. Google Scholar CrossRef Search ADS   24. Barabási A.-L. & Albert R. ( 1999) Emergence of scaling in random networks. Science , 286, 509– 512. Google Scholar CrossRef Search ADS PubMed  25. Schapire R. E. & Singer Y. ( 1999) Improved Boosting Algorithms Using Confidence-rated Predictions Mach. Learning , 37, 297– 336. 26. Liaw A. & Wiener M. ( 2002) Classification and regression by randomForest. R News , 2/3, 18– 22. 27. Suykens J. & Vandewalle J. ( 1999) Least squares support vector machine classifiers. Neural Process. Lett. , 9, 293– 300. Google Scholar CrossRef Search ADS   28. Deng L. & Yu D. ( 2014) Deep Learning: Methods and Applications. Foundations and trends in signal processing , 7, 197– 387. Google Scholar CrossRef Search ADS   29. Schmidhuber J. ( 2015) Deep Learning in Neural Networks: An Overview. Neural Networks , 61, 85– 117. Google Scholar CrossRef Search ADS PubMed  30. LeCun Y., Bengio Y. & Hinton G. ( 2015) Deep learning. Nature , 521, 436– 444. Google Scholar CrossRef Search ADS PubMed  31. Arel I., Rose D. C. & Karnowski T. P. ( 2010) Deep Machine Learning–-A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine , 5, 13– 18. Google Scholar CrossRef Search ADS   32. Khodadadi A. & Jalili M. ( 2017) Sign prediction in social networks based on tendency rate of equivalent micro-structures. Neurocomputing , 257, 175– 184. Google Scholar CrossRef Search ADS   33. Rosen Y. & Louzoun Y. ( 2014) Directionality of real world networks as predicted by path length in directed and undirected graphs. Phys. A Stat. Mech. Appl. , 401, 118– 129. Google Scholar CrossRef Search ADS   34. Itzhack R., Tsaban L. & Louzoun Y. ( 2013) Long loops of information flow in genetic networks highlight an inherent directionality. Syst. Biomed. , 1, 47– 54. Google Scholar CrossRef Search ADS   35. Wuchty S. & Almaas E. ( 2005) Peeling the yeast protein network. Proteomics , 5, 444– 449. Google Scholar CrossRef Search ADS PubMed  36. Kumar R., Novak J. & Tomkins A. ( 2010) Structure and evolution of online social networks. Link Mining: Models, Algorithms, and Applications  ( Yu P. Han J. & Faloutsos C. eds). New York, NY: Springer, pp. 337– 357. 37. Tversky A. & Kahneman D. ( 1981) The framing of decisions and the psychology of choice. Science , 211, 453– 458. Google Scholar CrossRef Search ADS PubMed  38. Fisher R. ( 1924) The distribution of the partial correlation coefficient. Metron , 3, 329– 332. 39. Granger C. W. J. ( 1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica , 37, 424. Google Scholar CrossRef Search ADS   © The authors 2018. Published by Oxford University Press. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Journal of Complex NetworksOxford University Press

Published: May 19, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off