# Measuring partial balance in signed networks

Measuring partial balance in signed networks Abstract Is the enemy of an enemy necessarily a friend? If not, to what extent does this tend to hold? Such questions were formulated in terms of signed (social) networks and necessary and sufficient conditions for a network to be ‘balanced’ were obtained around 1960. Since then the idea that signed networks tend over time to become more balanced has been widely used in several application areas. However, investigation of this hypothesis has been complicated by the lack of a standard measure of partial balance, since complete balance is almost never achieved in practice. We formalize the concept of a measure of partial balance, discuss various measures, compare the measures on synthetic datasets and investigate their axiomatic properties. The synthetic data involves Erdős–Rényi and specially structured random graphs. We show that some measures behave better than others in terms of axioms and ability to differentiate between graphs. We also use well-known data sets from the sociology and biology literature, such as Read’s New Guinean tribes, gene regulatory networks related to two organisms, and a network involving senate bill co-sponsorship. Our results show that substantially different levels of partial balance is observed under cycle-based, eigenvalue-based and frustration-based measures. We make some recommendations for measures to be used in future work. 1. Introduction Transitivity of relationships has a pivotal role in analysing social interactions. Is the enemy of an enemy a friend? What about the friend of an enemy or the enemy of a friend? Network science is a key instrument in the quantitative analysis of such questions. Researchers in the field are interested in knowing the extent of transitivity of ties and its impact on the global structure and dynamics in communities with positive and negative relationships. Whether the application involves international relationships among states, friendships and enmities between people, or ties of trust and distrust formed among shareholders, relationship to a third entity tends to be influenced by immediate ties. There is a growing body of literature that aims to connect theories of social structure with network science tools and techniques to study local behaviours and global structures in signed graphs that come up naturally in many unrelated areas. The building block of structural balance is a work by Heider [1] that was expanded into a set of graph-theoretic concepts by Cartwright and Harary [2] to handle a social psychology problem a decade later. The relationship under study has an antonym or dual to be expressed by the opposite sign [3]. In a setting where the opposite of a negative relationship is a positive relationship, a tie to a distant neighbour can be expressed by the product of signs reaching him. Cycles containing an odd number of negative edges are considered to be unbalanced, guaranteeing total balance therefore only in networks containing no such cycles. This strict condition makes it quite unlikely for a signed network to be totally balanced. The literature on signed networks suggests many different formulae to measure balance. These measures are useful for detecting total balance and imbalance, but for intermediate cases their performance is not clear and has not been systematically studied. Our contribution The main focus of this article is to provide insight into measuring partial balance, as much uncertainty still exists on this. The dynamics leading to specific global structures in signed networks remain speculative even after studies with fine-grained approaches. The central thesis of this article is that not all measures are equally useful. We provide a numerical comparison of several measures of partial balance on a variety of undirected signed networks, both randomly generated and inferred from well-known data sets. Using theoretical results for simple classes of graphs, we suggest an axiomatic framework to evaluate these measures and shed light on the methodological details involved in using such measures. This article begins by laying out the theoretical dimensions of the research in Section 2 and looks at basic definitions and terminology. In Section 3, different means of checking for total balance are outlined. Section 4 discusses some approaches to measuring partial balance in Eqs (4.1)–(4.8), categorized into three families of measures 4.1–4.3 and summarized in Table 1. Numerical results on synthetic data are provided in Figs 1 and 2 in Section 5. Section 6 is concerned with analytical results on synthetic data in closed-form formulae in Table 2 and visually represented in Figs 3 and 4. Axioms and desirable properties are suggested in Section 7 to evaluate the measures systematically. Section 8 concerns recommendations for choosing a measure of balance. Numerical results on real signed networks are presented in Section 9. Finally, Section 10 summarizes the study and provides direction for future research. Table 1 Measures of partial balance summarized Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Table 1 Measures of partial balance summarized Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Table 2 Balance in minimally and maximally unbalanced graphs $${K}_n^a$$ (6.1) and $${K}_n^c$$ (6.2) $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ Table 2 Balance in minimally and maximally unbalanced graphs $${K}_n^a$$ (6.1) and $${K}_n^c$$ (6.2) $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ Fig. 1. View largeDownload slide Partial balance measured by different methods in Erdős–Rényi network with various number of negative edges (colour version online). (a) The mean values of seven measures of partial balance. (b) The mean values of relative $$k$$-balance $$D_k$$. (c) The standard deviation of $$D$$ and $$C$$. (d) The standard deviation of $$W, T, A$$ and $$F.$$ Fig. 1. View largeDownload slide Partial balance measured by different methods in Erdős–Rényi network with various number of negative edges (colour version online). (a) The mean values of seven measures of partial balance. (b) The mean values of relative $$k$$-balance $$D_k$$. (c) The standard deviation of $$D$$ and $$C$$. (d) The standard deviation of $$W, T, A$$ and $$F.$$ Fig. 2. View largeDownload slide Partial balance measured by different methods in 50% negative 4-regular graphs of different orders $$n$$ and decreasing densities $$4/n-1$$. Fig. 2. View largeDownload slide Partial balance measured by different methods in 50% negative 4-regular graphs of different orders $$n$$ and decreasing densities $$4/n-1$$. Fig. 3. View largeDownload slide Partial balance measured by different methods for $${K}_n^a$$ (6.1). Fig. 3. View largeDownload slide Partial balance measured by different methods for $${K}_n^a$$ (6.1). Fig. 4. View largeDownload slide Partial balance measured by different methods for $${K}_n^c$$ (6.2). Fig. 4. View largeDownload slide Partial balance measured by different methods for $${K}_n^c$$ (6.2). 2. Problem statement and notation Throughout this article, the terms signed graph and signed network will be used interchangeably to refer to a graph with positive and negative edges. We use the term cycle only as a shorthand for referring to simple cycles of the graph. While several definitions of the concept of balance have been suggested, this article will only use the original definition for undirected signed graphs unless explicitly stated. We consider an undirected signed network $$G = (V,E,\sigma),$$ where $$V$$ and $$E$$ are the sets of vertices and edges, and $$\sigma$$ is the sign function $$\sigma: E\rightarrow\{-1,+1\}$$. The set of nodes is denoted by $$V$$, with $$|V| = n$$. The set of edges is represented by $$E$$ including $$m^-$$ negative edges and $$m^+$$ positive edges adding up to a total of $$m=m^+ + m^-$$ edges. The signed adjacency matrix and the unsigned adjacency matrix are defined in (2.1) and (2.2). Entries of A are represented by $$a_{ij}$$. \begin{align}\label{eq1} \textbf{A}{_u}{_v} &= \left\{ \begin{array}{@{}ll} \sigma_{u,v} & \mbox{if } {u,v}\in E \\ 0 & \mbox{if } {u,v}\notin E \end{array} \right.\\ \end{align} (2.1) \begin{align} \label{eq1.1} |\textbf{A}|{_u}{_v} &= \left\{ \begin{array}{@{}ll} 1 & \mbox{if } {u,v}\in E \\ 0 & \mbox{if } {u,v}\notin E \end{array} \right. \end{align} (2.2) The positive degree and negative degree of node $$i$$ are denoted by $$d^+ _{i}$$ and $$d^- _{i}$$ representing the number of positive and negative edges connected to node $$i,$$ respectively. They are calculated based on $$d^+ _{i}=(\sum_j |a_{ij}| + \sum_j a_{ij})/2$$ and $$d^- _{i}=(\sum_j |a_{ij}| - \sum_j a_{ij})/2$$. The degree of node $$i$$ is represented by $$d_i$$ and equals the number of edges connected to node $$i$$. It is calculated based on $$d_{i}= d^+ _{i} + d^- _{i}= \sum_j |a_{ij}|$$. A walk of length $$k$$ in $$G$$ is a sequence of nodes $$v_0,v_1,...,v_{k-1},v_k$$ such that for each $$i=1,2,...,k$$ there is an edge from $$v_{i-1}$$ to $$v_i$$. If $$v_0=v_k$$, the sequence is a closed walk of length $$k$$. If all the nodes in a closed walk are distinct except the endpoints, it is a cycle (simple cycle) of length $$k$$. The sign of a cycle is the product of the signs of its edges. A cycle is balanced if its sign is positive and is unbalanced otherwise. The total number of balanced cycles (closed walks) is denoted by $$O_k ^+$$ ($$Q_k ^+$$). Similarly, $$O_k ^-$$ ($$Q_k ^-$$) denotes the total number of unbalanced cycles (closed walks). The total number of cycles (closed walks) is represented by $$O_k=O_k ^+ + O_k ^-$$ ($$Q_k=Q_k ^+ + Q_k ^-$$). 3. Checking for balance It is essential to have an algorithmic means of checking for balance. We recall several known methods here. The characterization of bi-polarity, that a signed graph is balanced if and only if its vertex set can be partitioned into two subsets such that each negative edge joins vertices belonging to different subsets [2], leads to an obvious breadth-first search procedure similar to the usual algorithm for determining whether a graph is bipartite. An alternate algebraic criterion is that the eigenvalues of the signed and unsigned adjacency matrices are equal if and only if the signed network is balanced [4]. For our purposes the following additional method of detecting balance is also important. We define the switching function$$g(X)$$ operating over a set of vertices $$X\subseteq V$$ as follows: \begin{align} \label{eq1.2} \sigma^{g(X)} _{(u,v)}= \left\{ \begin{array}{ll} \sigma_{u,v} & \mbox{if } {u,v}\in X \ \text{or} \ {u,v}\notin X \\ -\sigma_{u,v} & \mbox{if } u \in X \ \text{and} \ v\notin X \ \text{or} \ u \notin X \ \text{and} \ v \in X. \end{array} \right. \end{align} (3.1) As the sign of cycles remains the same when $$g$$ is applied, any balanced graph can switch to an all-positive signature [5]. Accordingly, a balance detection algorithm can be developed by constructing a switching rule on a spanning tree and a root vertex, as suggested in [5]. Finally, another method of checking for balance in connected signed networks makes use of the signed Laplacian matrix defined by $$\textbf{L}=\textbf{D}-\textbf{A}$$ where $$\textbf{D}_{ii} = \sum_j |a_{ij}|$$ is the diagonal matrix of degrees. The signed Laplacian matrix, $$\textbf{L}$$, is positive-semidefinite, that is all of its eigenvalues are non-negative [6]. The smallest eigenvalue of $$\textbf{L}$$ equals 0 if and only if the graph is balanced [7]. 4. Measures of partial balance Several ways of measuring the extent to which a graph is balanced have been introduced by researchers. We discuss three families of measures here and summarize them in Table 1. 4.1 Measures based on cycles The simplest of such measures is the degree of balance suggested by Cartwright and Harary [2], which is the fraction of balanced cycles: \begin{align}\label{eq1.5} D(G)= \frac {\sum \limits_{k=3}^n O_k ^+ } {\sum \limits_{k=3}^n O_k}. \end{align} (4.1) There are other cycle-based measures closely related to $$D(G)$$. The relative $$k$$-balance, denoted by $$D_k(G)$$ and formulated in (4.2) is a cycle-based measure where the sums defining the numerator and denominator of $$D(G)$$ are restricted to a single term of fixed index $$k$$ [3, 8]. The special case $$k=3$$ is called the triangle index, denoted by $$T(G)$$. \begin{align}\label{eq1.6} D_k(G)= \frac {O_k ^+} {O_k}. \end{align} (4.2) Giscard et al. have recently introduced efficient algorithms for counting simple cycles [9] making it possible to use various measures related to $$D_k(G)$$ to evaluate balance in signed networks [10]. A generalization is weighted degree of balance, obtained by weighting cycles based on length as in (4.3), in which $$f(k)$$ is a monotonically decreasing non-negative function of the length of the cycle. \begin{align}\label{eq1.7} C(G)=\frac{\sum \limits_{k=3}^n f(k) O_k ^+ }{\sum \limits_{k=3}^n f(k) O_k} \end{align} (4.3) The selection of an appropriate weighting function is briefly discussed by Norman and Roberts [11], suggesting functions such as $$1/k,1/k^2,1/2^k$$, but no objective criterion for choosing such a weighting function is known. We consider two weighting functions $$1/k$$ and $$1/k!$$ for evaluating $$C(G)$$ in this article. Given the typical distribution of cycles of different length, the function $$f(k)=1/k$$ provides a rate of decay slower than the typical rate of increase in cycles of length $$k$$, resulting in $$C(G)$$ being dominated by longer cycles. The function $$f(k)=1/k!$$ provides a rate of decay faster than the typical rate of increase in cycles of length $$k$$, resulting in $$C(G)$$ being mostly determined by shorter cycles. Although fast algorithms are developed for counting and listing cycles of undirected graphs [9, 12], the number of cycles grows exponentially with the size of a typical real-world network. To tackle the computational complexity, Terzi and Winkler [13] used $$D_3(G)$$ in their study and replaced the triangles by closed walks of length $$3$$. The triangle index can be calculated efficiently by the formula in (4.4) where $${\mathrm{Tr}}(A)$$ denotes the trace of $$A$$. \begin{align}\label{eq1.8} T(G)= D_3(G) = \frac{O_3 ^+}{O_3} = \frac{{\mathrm{Tr}}(A^3)+{\mathrm{Tr}}(|A|^3)}{2 {\mathrm{Tr}}(|A|^3)} \end{align} (4.4) The relative signed clustering coefficient is suggested as a measure of balance by Kunegis [6], taking insight from the classic clustering coefficient. After normalization, this measure is equal to the triangle index. Having access to an easy-to-compute formula [13] for $$T(G)$$ obviates the need for a clustering-based calculation which requires iterating over all triads in the graph. Bonacich argues that dissonance and tension are unclear in cycles of length greater than three [14], justifying the use of the triangle index to analyse structural balance. However the neglected interactions may represent potential tension and dissonance, though not as strong as that represented by unbalanced triads. One may consider a smaller weight for longer cycles, thereby reducing their impact rather than totally disregarding them. Note that $$C(G)$$ is a generalization of both $$D(G)$$ and $$D_k(G)$$. In all the cycle-based measures, we consider a value of $$1$$ for the case of division by 0. This allows the measures $$D(G)$$ and $$C(G)$$ ($$D_{k}(G)$$) to provide a value for acyclic graphs (graphs with no $$k$$-cycle). 4.2 Measures related to eigenvalues Beside checking cycles, there are computationally easier approaches to structural balance such as the walk-based approach. The walk-based measure of balance is suggested by Pelino and Maimone [15] with more weight placed on shorter closed walks than the longer ones. Let $${\mathrm{Tr}}(e^\textbf{A})$$ and $${\mathrm{Tr}}(e^{|\textbf{A}|})$$ denote the trace of the matrix exponential for $$\textbf{A}$$ and $$|\textbf{A}|$$ respectively. In Equation (4.5), closed walks are weighted by a function with a relatively fast rate of decay compared to functions suggested in [11]. The weighted ratio of balanced to total closed walks is formulated in (4.5). \begin{align}\label{eq2} W(G)= \frac {K(G)+1}{2}, \quad K(G)=\frac{\sum \limits_{k}\frac{Q_k ^+ - Q_k ^-}{k!}}{\sum \limits_{k} \frac{Q_k ^+ + Q_k ^-}{k!}}=\frac{{\mathrm{Tr}}(e^\textbf{A})}{{\mathrm{Tr}}(e^{|\textbf{A}|})}. \end{align} (4.5) Regarding the calculation of $${\mathrm{Tr}}(e^\textbf{A})$$, one may use the standard fact that $$\textbf{A}$$ is a symmetric matrix for undirected graphs. It follows that $${\mathrm{Tr}}(e^\textbf{A})=\sum _{i} e^{\lambda_i}$$ in which $$\lambda_i$$ ranges over eigenvalues of $$\textbf{A}$$. The idea of a walk-based measure was then used by Estrada and Benzi [16]. They have tested their measure on five signed networks resulting in values inclined towards imbalance which were in conflict with some previous observations [6, 17]. The walk-based measure of balance suggested in [16] have been scrutinized in the subsequent studies [10, 18]. Giscard et al. discusses how using closed walks in which the edges might be repeated results in mixing the contribution of various cycle lengths and leads to values that are difficult to interpret [10]. Singh et al. criticize the walk-based measure from another perspective and explains how the inverse factorial weighting distorts the measure towards showing imbalance [18]. The idea of another eigenvalue-based measure comes from spectral graph theory [19]. The smallest eigenvalue of the signed Laplacian matrix provides a measure of balance called algebraic conflict [19]. Algebraic conflict, denoted by $$\lambda(G)$$, equals zero if and only if the graph is balanced. Positive-semidefiniteness of $$\textbf{L}$$ results in $$\lambda(G)$$ representing the amount of imbalance in a signed network. Algebraic conflict is used in [6] to compare the level of balance in online signed networks of different sizes. Moreover, Pelino and Maimone analysed signed network dynamics based on $$\lambda(G)$$ [15]. Bounds for $$\lambda(G)$$ are investigated by [7] leading to recent applicable results in [20, 21]. Belardo and Zhou prove that $$\lambda(G)$$ for a fixed $$n$$ is maximized by the complete all-negative graph of order $$n$$ [21]. Belardo shows that $$\lambda(G)$$ is bounded by $$\lambda_\text{max}(G)=\overline{d}_{\text{max}} -1$$ in which $$\overline{d}_{\text{max}}$$ represents the maximum average degree of endpoints over graph edges [20]. We use this upper bound to normalize algebraic conflict. Normalized algebraic conflict, denoted by $$A(G)$$, is expressed in (4.6). \begin{align}\label{eq3} A(G)=1- \frac {\lambda(G)}{\overline{d}_{\text{max}} -1} , \quad \overline{d}_{\text{max}}=\max_{(u,v)\in E} (d_u+d_v)/2. \end{align} (4.6) 4.3 Measures based on frustration A quite different measure is the frustration index. Originally proposed for applications on ferromagnetic molecules, it is also referred to as the line index for balance by [22]. A set $$E^*$$ of edges is called deletion-minimal if deleting all edges in $$E^*$$ results in a balanced graph, but no proper subset of $$E^*$$ has this property. Each edge in $$E^*$$ lies on an unbalanced cycle and every unbalanced cycle of the network contains an odd number of edges in $$E^*$$ [5]. Iacono et al. showed that $$L(G)$$ equals the minimum number of unbalanced fundamental cycles induced over all spanning trees of the graph [23]. The graph resulted from deleting all edges in $$E^*$$ is called a balanced transformation of a signed graph. The frustration index equals the minimum cardinality among deletion-minimal sets as in (4.7). \begin{align}\label{eq5.0} L(G)=\min \limits_{E^*} {|E^*|.} \end{align} (4.7) Similarly, in a setting where each vertex is given a black or white colour, if the endpoints of positive (negative) edges have different colours (same colour), they are ‘frustrated’. The frustration index is therefore the smallest number of frustrated edges over all possible 2-colourings of the nodes. $$L(G)$$ is hard to compute as the special case with all edges being negative is equivalent to the MAXCUT problem [24], which is known to be NP-hard. There are upper bounds for the frustration index such as $$L(G)\leq m^-$$ which states the obvious result of removing all negative edges. Facchetti et al. have used computational methods related to Ising spin glass models to estimate the frustration index in relatively large online social networks [17]. Using an estimation of the frustration index obtained by a heuristic algorithm, they concluded that the online signed networks are extremely close to total balance; an observation that contradicts some other research studies like [16]. The number of frustrated edges in special Erdős–Rényi graphs is analysed by El Maftouhi et al. [25]. It follows a binomial distribution with parameters $$n(n-1)/2$$ and $$p/2$$ in which $$p$$ represents equal probabilities for positive and negative edges in Erdős–Rényi graph. Therefore, the expected number of frustrated edges is $$n(n-1)p/4$$. They also prove that such a network is almost always not balanced when $$p \geq (\log2)/n$$. It is straightforward to prove that frustration index is equal to the minimum number of negative edges over all switching functions [5]. Moreover, if $$m^- {(G^{g(X)})}=L(G)$$ then every vertex under switching $${g(X)}$$ satisfies $$d^- _{v^{g(X)}} \leq d^+ _{v^{g(X)}}$$. Tomescu [26] proves that the frustration index is bounded by $$\left\lfloor \left(n-1 \right)^2/4 \right\rfloor$$. Bounds for the largest number of frustrated edges for a graph with $$n$$ nodes and $$m$$ edges over all possible 2-colourings of the nodes are provided by [27]. It that $${L(G)} \leq {m}/{2}$$; an upper bound that is not necessarily tight. Another upper bound for the frustration index is reported in [23] referred to as the worst-case upper bound on the consistency deficit. However, the frustration index values in complete graphs with all negative edges shows that the upper bound is incorrect. In order to compare with the other indices which take values in the unit interval and give the value $$1$$ for balanced graphs, we suggest normalized frustration index, denoted by $$F(G)$$ and formulated in (4.8). \begin{align}\label{eq5.1} F(G)=1- \frac {L(G)}{m/2} \end{align} (4.8) Using a different upper bound for normalizing the frustration index, we discuss another frustration-based measure in Section 6.3 and formulate it in Eq. 6.1. 4.4 Other methods of evaluating balance Balance can also be analysed by blockmodelling based on iteratively calculating Pearson moment correlations from the columns of $$\textbf{A}$$ [28]. Blockmodelling reveals increasingly homogeneous sets of vertices. Doreian and Mrvar discuss this approach in partitioning signed networks [29]. Applying the method to Correlates of War data on positive and negative international relationships, they refute the hypothesis that signed networks gradually move towards balance using blockmodelling alongside some variations of $$D(G)$$ and $$L(G)$$ [30]. Moreover, there are probabilistic methods that compare the expected number of balanced and unbalanced triangles in the signed network and its reshuffled version [31–34]. As long as these measures are used to evaluate balance, the result will not be different to what $$T(G)$$ provides alongside a basic statistical testing of its value against reshuffled networks. Some researchers suggest that studying the structural dynamics of signed networks is more important than measuring balance [35, 36]. This approach is usually associated with considering an energy function to be minimized by local graph operations decreasing the energy. However, the energy function is somehow a measure of network imbalance which requires a proper definition and investigation of axiomatic properties. Seven measures of partial balance investigated in this article are outlined in Table 1. Outline of the rest of the article We started by discussing balance in signed networks in Sections 2 and 3 and reviewed different measures in Section 4. We will provide some observations on synthetic data in Figs 1–4 in Sections 5 and 6 to demonstrate the values of different measures. The reader who is not particularly interested in the analysis of measures using synthetic data may directly go to Section 7 in which we introduce axioms and desirable properties for measures of partial balance. In Section 8, we provide some recommendations on choosing a measure and discuss how using unjustified measures has led to conflicting observations in the literature. The numerical results on real-signed networks are presented in Section 9. 5. Numerical results on synthetic data In this section, we start with a brief discussion on the relationship between negative edges and imbalance in networks. According to the definition of structural balance, all-positive signed graphs (merely containing positive edges) are totally balanced. Intuitively, one may expect that all-negative signed graphs are very unbalanced. Perhaps another intuition derived by assuming symmetry is that increasing the number of negative edges in a network reduces partial balance proportionally. We analyse partial balance in randomly generated graphs to show neither of the intuitions is correct. Our motivation for analysing balance in such graphs is to gain an understanding of the behaviour of the measures and their connections with signed graph parameters like $$m^-,n$$ and density. We use randomly generated graphs as the underlying structure is not important for this analysis. 5.1 Erdős–Rényi random network with various $$m^-$$ We calculate measures of partial balance, denoted by $$\mu(G)$$, for an Erdős–Rényi random network with a various number of negative edges. Figure 1 demonstrates the partial balance measured by different methods. For each data point, we report the average of 50 runs, each assigning negative edges at random to the fixed underlying graph. The subfigures (c) and (d) of Fig. 1 show the mean along with $$\pm 1$$ standard deviation. Measures $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ are observed to tend to $$0.5$$ where $$m^- > 5$$, not differentiating partial balance in graphs with a non-trivial number of negative edges. Given the typical distribution of cycles of different length, we expect $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ to be mostly determined by longer cycles that are much more frequent. For this particular graph, cycles with a length of 10 and above account for more than 96% of the total cycles in the graph. Such long cycles tend to be balanced roughly half the time for almost all values of $$m^-$$ (for all the values within the range of $$5 \leq m^- \leq 45$$ in the network considered here). The perfect overlap of data points for $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ in Fig. 1 shows that using a linear rate of decay does not make a difference. One may think that if we use $$D_k(G)$$ which does not mix cycles of different length, it may circumvent the issues. However, subfigure (b) of Fig. 1 demonstrating values of $$D_k(G)$$ for different cycle lengths shows the opposite. It shows not only does $$D_k(G)$$ not resolve the problems of lack of sensitivity and clustering around $$0.5$$, but it behaves unexpectedly with substantially different values based on the parity of $$k$$ when $$m^- > 35$$. $$C(G)$$ weighted by $$f(k)=1/k!$$ and mostly determined by shorter cycles, decreases slower than $$D(G)$$ and then provides values close to $$0.5$$ for $$m^-\geq10$$. $$W(G)$$ drops below $$0.6$$ for $$m^-=10$$ and then clusters around $$0.55$$ for $$m^->10$$. $$T(G)$$ is the measure with a wide range of values symmetric to $$m^-$$. The single most striking observation to emerge is that $$A(G)$$ seems to have a completely different range of values, which we discuss further in Section 6.3. A steady linear decrease is observed from $$F(G)$$ for $$m^-\leq10$$. 5.2 Four-regular random networks of different orders To investigate the impacts of graph order (number of nodes) and density on balance, we computed the measures for randomly generated 4-regular graphs with 50% negative edges. Intuitively, we expect values to have low variation and no trends for similarly structured graphs of different orders. Figure 2 demonstrates the analysis in a setting where the degree of all the nodes remains constant, but the density ($$4/n-1$$) is decreasing in larger graphs. For each data point the average and standard deviation of 100 runs are reported. In each run, negative weights are randomly assigning to half of the edges in a fixed underlying 4-regular graph of order $$n$$. According to Fig. 2, the four measures differ not only in the range of values, but also in their sensitivity to the graph order and density. First, $$W(G) \rightarrow 1$$ when $$n \rightarrow \infty$$ for larger graphs although the graphs are structurally similar, which goes against intuition. Clustered around $$0.5$$ is $$T(G)$$ which features a substantial standard deviation for 4-regular random graphs. Values of $$A(G)$$ are around $$0.8$$ and do not seem to change substantially when $$n$$ increases. $$F(G)$$ provides stationary values around $$0.7$$ when $$n$$ increases. While $$\lambda(G)$$ and $$L(G)$$ depend on the graph order and size, the relative constancy of $$A(G)$$ and $$F(G)$$ values suggest the normalized measures $$A(G)$$ and $$F(G)$$ are largely independent of the graph size and order, as our intuition expects. We further discuss the normalization of $$A(G)$$ and $$F(G)$$ in Section 6.3. 6. Analytical results on synthetic data In this section, we analyse the capability of measuring partial balance in some families of specially structured graphs. Closed-form formulae for the measures in specially structured graphs are provided in Table 2. The two families of complete signed graphs that we investigate are as follows in 6.1 and 6.2. 6.1 Minimally unbalanced complete graphs with a single negative edge The first family includes complete graphs with a single negative edge, denoted by $${K}_n^a$$. Such graphs are only one edge away from a state of total balance. It is straight-forward to provide closed-form formulae for $$\mu({K}_n^a)$$ as expressed in (A.4) – (A.10) in Appendix A. In $${K}_n^a$$, intuitively we expect $$\mu({K}_n^a)$$ to increase with $$n$$ and $$\mu({K}_n^a) \rightarrow 1$$ as $$n \rightarrow \infty$$. We also expect the measure to detect the imbalance in $${K}_3^a$$ (a triangle with one negative edge). Figure 3 demonstrates the behaviour of different indices for complete graphs with one negative edge. $$W({K}_n^a)$$ gives unreasonably large values for $$n < 5$$. Except for $$W({K}_n^a)$$, the measures are co-monotone over the given range of $$n$$. 6.2 Maximally unbalanced complete graphs with all-negative edges The second family of specially structured graphs to analyse includes all-negative complete graphs denoted by $${K}_n^c$$. The indices are calculated in (A.12)–(A.18) in Appendix A. Our calculations for $$L({K}_n^c)$$ show that the upper bound suggested for the frustration index in [23] is incorrect. Intuitively, we expect $$\mu({K}_n^c) \rightarrow 0$$ as $$n \rightarrow \infty$$. Figure 4 illustrates $$D({K}_n^c)$$ oscillating around $$0.5$$ and $$W({K}_n^c),C({K}_n^c) \rightarrow 0.5$$ as $$n$$ increases. We explain the oscillation of $$D({K}_n^c)$$ in Appendix A. Clearly, measures $$D(G)$$, $$C(G)$$ and $$W(G)$$ do no work well for the family of graphs considered here. Figure 4 shows that $$F({K}_n^c) \rightarrow 0$$ as $$n \rightarrow \infty$$ as expected based on Table 2. 6.3 Normalization of the measures It is worth mentioning that measures of partial balance may lead to different maximally unbalanced complete graphs. Based on $$\lambda(G)$$ and $$L(G)$$, $${K}_n^c$$ represents the family of maximally unbalanced graphs, while it is merely one family among the maximally unbalanced graphs according to $$T(G)$$. Estrada and Benzi have found complete graphs comprised of one cycle of $$n$$ positive edges with the remaining pairs of nodes connected by negative edges to be a family of maximally unbalanced graphs based on $$W(G)$$ [16], while this argument is not supported by any other measures. As the signs of cycles in a graph are not independent, the structure of maximally unbalanced graphs under the cycle-based measures, $$D(G)$$ and $$C(G)$$, is not known. A simple comparison of $$L({K}_n^c)$$ (calculations provided in (A.17) in Appendix A) and the proposed upper bound $$m/2=(n^2-n)/4$$ reveals substantial gaps. These gaps equal $$n/4$$ for even $$n$$ and $$(n-1)/4$$ for odd $$n$$. This supports the previous discussions on looseness of $$m/2$$ as an upper bound for frustration index. Assuming $${K}_n^c$$ to be ‘the maximally unbalanced graphs’ under $$L(G)$$, $$\lfloor m/2-(n-1)/4 \rfloor$$ would be a tight upper bound for the frustration index. This allows a modified version of normalized frustration index, denoted by $$F^\prime (G)$$ and defined in (6.1), to take the value zero for $${K}_n^c$$. \begin{align} \label{eq5.8} F^\prime (G)=1-L(G)/ \lfloor m/2-(n-1)/4 \rfloor \end{align} (6.1) Similarly, the upper bound, $$\lambda_\text{max}(G)$$, used to normalize algebraic conflict, is not tight for many graphs. For instance, in the Erdős–Rényi studied in Section 5 with $$m=m^-$$, the existence of an edge with $$\overline{d}_{\text{max}}=9$$ makes $$\lambda_\text{max}(G)=8$$, while $$\lambda(G)=1.98$$. The two observations mentioned above suggest that tighter upper bounds can be used for normalization. However, the statistical analysis we use in Section 9 to evaluate balance in real networks is independent of the normalization method, so we do not pursue this question further now. 6.4 Expected values of the cycle-based measures Relative $$k$$-balance, $$D_k (G)$$, is proved by El Maftouhi et al. [25] to tend to $$0.5$$ for Erdős–Rényi graphs such that the probability of an edge being negative is equal to $$1/2$$. Moreover, Giscard et al. discuss the probability distribution of $$1 - D_k(G)$$. Their discussion is based on a model in which sign of any edge is negative with a fixed probability [10, Section 4.2]. We use the same model to present some simple observations that appear not to have been noticed by previous authors advocating for the use of cycle-based measures. We are going to take a different approach from that of Giscard et al. and merely calculate the expected values of cycle-based measures in general, rather than the full distribution under additional assumptions. Note that, for an arbitrary graph, $${O_k ^+}/ {O_k}$$ gives the probability that a randomly chosen $$k$$-cycle is balanced and is denoted by $$B_{(k,q)}$$. Theorem 6.1 Let $$G$$ be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability $$q$$, and positive otherwise. Then \begin{align} E(D_k(G)) & = (1+{(1-2q)}^k)/2. \label{eq1.9} \end{align} (6.2) Proof. Note that a cycle is balanced if and only if it has an even number of negative edges. Thus $$E\left(\frac {O_k ^+} {O_k}\right) = \sum \limits_{i \: \text{even}} {\binom{k}{i}} q^i (1-q)^{k-i}$$ (compare with [10, Eq. 4.1]). This simplifies to the stated formula (details of calculations are given in Appendix A). □ Note that the expected values are independent of the graph structure and obtaining them does not require making any assumptions on the signs of cycles being independent random variables. As the signs of the edges are independent random variables, the expected value of $$B_{(k,q)}$$ can be obtained by summing on all cases having an even number of negative signs in the $$k$$-cycle. Based on (6.2), $$E(D_k(G)) = 1$$ when $$q = 0$$ and $$E(D_k(G)) = 0.5$$ when $$q = 0.5$$ supporting our intuitive expectations. However, when $$q = 1$$, $$E(D_k(G))$$ takes extremal values based on the parity of $$k$$ which is a major problem as previously observed in the subfigure (b) of Fig. 1. It is clear to see that the parity of $$k$$ makes a substantial difference to $$D_k(G)$$ when a considerable proportion of edges are negative. Theorem 6.2 Let $$G$$ be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability $$q$$, and positive otherwise. Then \begin{align} E(D(G)) & = \frac{1}{2} \frac { \sum \limits_{k=3}^n (1+{(1-2q)}^k)(O_k) } {\sum \limits_{k=3}^n O_k} \label{eq1.10} \end{align} (6.3) Proof. The random variable $${O_k ^+}$$ can be written as $${O_k ^+}= B_{(k,q)}.{O_k}$$. Taking expected value from the two sides gives $$E({O_k ^+})= {O_k}.E(B_{(k,q)})$$ as $$O_k$$ is a constant for a fixed $$k$$. This completes the proof using the result from Theorem 6.1. □ Note that the exponential decay of the factor $$(1-2q)^k$$ reduces the contribution for large $$k$$, and small values of $$k$$ will dominate for many graphs. For example, if $$q=0.2$$ the expression for $$E(D(G))$$ simplifies to $$\frac{1}{2} + \frac{1}{2} \frac{ \sum \limits_{k=3}^n{0.6}^k O_k}{\sum \limits_{k=3}^n O_k}.$$ For many graphs encountered in practice, $$O_k$$ will grow exponentially with $$k$$, but at a rate less than $$1/0.6$$, so the tail contribution will be small. Larger values of $$q$$ only make this effect more pronounced. Thus we expect that $$E(D(G))$$ will often be very close to $$0.5$$ in signed graphs with a reasonably large fraction of negative edges (we have already seen such a phenomenon in Section 5.1). A similar conclusion can be made for $$C(G)$$. This casts doubt on the usefulness of the measures that mix cycles of different length whether weighted or not. While we have also observed many problems involving values of cycle-based measures on synthetic data in other parts of Sections 5 and 6, we will continue evaluating their axiomatic properties in Section 7 and then summarize the methodological findings in Section 8. 7. Axiomatic framework of evaluation The results in Sections 5 and 6 indicate that the choice of measure substantially affects the values of partial balance. Besides that, the lack of a standard measure calls for a framework of comparing different methods. Two different sets of axioms are suggested in [11], which characterize the measure $$C(G)$$ inside a smaller family (up to the choice of $$f(k)$$). Moreover, the theory of structural balance itself is axiomatized in [37]. However, to our knowledge, axioms for general measures of balance have never been developed. Here we provide the first set of axioms and desirable properties for measures of partial balance, in order to shed light on their characteristics and performance. 7.1 Axioms for measures of partial balance We define a measure of partial balance to be a function $$\mu$$ taking each signed graph to an element of $$[0,1]$$. Worthy of mention is that some of these measures were originally defined as a measure of imbalance (algebraic conflict, frustration index and the original walk-based measure) calibrated at $$0$$ for completely balanced structures, so that some normalization was required, and perhaps our normalization choices can be improved on (see Section 6.3). As the choice of $$m/2$$ as the upper bound for normalizing the line index of balance was somewhat arbitrary, another normalized version of frustration index is defined in (7.1). \begin{align} \label{eq6} X(G)=1-L(G)/{m^-} \end{align} (7.1) Before listing the axioms, we justify the need for an axiomatic evaluation of balance measures. As an attempt to understand the need for axiomatizing measures of balance, we introduce two unsophisticated and trivial measures that come to mind for measuring balance. The fraction of positive edges, denoted by $$Y(G)$$, is defined in (7.2) on the basis that all-positive signed graphs are balanced. Moreover, a binary measure of balance, denoted by $$Z(G)$$, is defined in (7.3). While $$Y(G)$$ and $$Z(G)$$ appear to be irrelevant, there is currently no reason not to use such measures. \begin{gather} \label{eq7} Y(G)=m^{+}/m \quad \\ \label{eq8} \end{gather} (7.2) \begin{gather} Z(G) = \left\{ \begin{array}{ll} 1 & \mbox{if } G \, \mbox{is totally balanced} \\ 0 & \mbox{if } G \, \mbox{is not balanced}. \end{array} \right. \end{gather} (7.3) We consider the following notation for referring to basic operations on signed graphs: $$G^{g(X)}$$ denotes signed graph $$G$$ switched by $$g(X)$$ (switched graph). $$G\oplus H$$ denotes the disjoint union of two signed graphs $$G$$ and $$H$$ (disjoint union). $$G\ominus e$$ denotes $$G$$ with $$e$$ deleted (removing an edge). $$G\ominus E^*$$ denotes $$G$$ with deletion-minimal edges deleted (balanced transformation). $$G\oplus C^+_3$$ denotes the disjoint union of graphs $$G$$ and a positive 3-cycle (adding a balanced 3-cycle). $$G\oplus C^-_3$$ denotes the disjoint union of graphs $$G$$ and a negative 3-cycle (adding an unbalanced 3-cycle). $$e\in E^*$$ denotes an edge in the deletion-minimal set. $$G \ominus E^* \oplus e$$ denotes the balanced transformation of a graph with an edge $$e$$ added to it. We list the following axioms: A1 $$0 \leq \mu(G) \leq 1$$. A2 $$\mu(G) = 1$$ if and only if $$G$$ is balanced. A3 If $$\mu(G) \leq \mu(H)$$, then $$\mu(G) \leq \mu(G\oplus H) \leq \mu(H)$$. A4 $$\mu(G^{g(X)}) = \mu(G)$$. The justifications for such axioms are connected to very basic concepts in balance theory. We consider A1 essential in order to make meaningful comparisons between measures. Introducing the notion of partial balance, we argue that total balance, being the extreme case of partial balance, should be denoted by an extremal value as in A2. In A3, the argument is that overall balance of two disjoint graphs is bounded between their individual balances. This also covers the basic requirement that the disjoint union of two copies of graph $$G$$ must have the same value of partial balance as $$G$$. Switching nodes should not change balance [5] as in A4. Table 3 shows how some measures fail on particular axioms. The results provide important insights into how some of the measures are not suitable for measuring partial balance. A more detailed discussion on the proof ideas and counterexamples related to Table 3 is provided in Appendix B. Table 3 Different measures satisfying (✓) or failing (✗) axioms $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ Table 3 Different measures satisfying (✓) or failing (✗) axioms $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 7.2 Some other desirable properties We also consider four desirable properties that formalize our expectations of a measure of partial balance. We do not consider the following as axioms in that they are based on adding or removing 3-cycles and edges which may bias the comparison in favour of cycle-based and frustration-based measures. Positive and negative 3-cycles are very commonly used to explain the theory of structural balance which makes B1 and B2 obvious requirements. Removing an edge from a set of edges whose removal results in balance, should not decrease balance as in B3. Finally, adding such an edge should not increase balance as in B4. B1 If $$\mu(G) \neq 1$$, then $$\mu(G\oplus C^+_3) > \mu(G)$$. B2 If $$\mu(G) \neq 0$$, then $$\mu(G\oplus C^-_3) < \mu(G)$$ B3 If $$e\in E^*$$, then $$\mu(G \ominus e) \geq \mu(G)$$. B4 If $$\mu(G)\neq 0$$ and $$\mu(G \ominus E^* \oplus e)\neq 1$$, then $$\mu(G \oplus e) \leq \mu(G)$$. Table 4 shows how some measures fail on particular desirable properties. It is worth mentioning that the evaluation in Tables 3–4 is somewhat independent of parametrization: for each strictly increasing function $$h$$ such that $$h(0)=0$$ and $$h(1)=1$$, the results in Tables 3–4 hold for $$h(\mu(G))$$. Proof ideas and counterexamples related to Table 4 is provided in Appendix B. Table 4 Different measures satisfying (✓) or failing (✗) desirable properties $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ Table 4 Different measures satisfying (✓) or failing (✗) desirable properties $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ Another desirable property, which we have not formulated as a formal requirement owing to its vagueness, is that the measure takes on a wide range of values. For example, $$D(G)$$ and $$C(G)$$ tend rapidly to $$0.5$$ as $$n$$ increases which makes their interpretation and possibly comparison with other measures difficult. A possible way to formalize it would be expecting $$\mu(G)$$ to give $$0$$ and $$1$$ on each complete graph of order at least $$3$$, for some assignment of signs of edges. This condition would be satisfied by $$T(G)$$ and $$A(G)$$, as well as $$F^\prime(G)$$. However, $$D(G),C(G)$$ and $$W(G)$$ would not satisfy this condition due to the existence of balanced cycles and closed walks in complete signed graphs of general orders. Moreover, the very small standard deviation of $$D(G)$$, $$C(G)$$ and $$W(G)$$ makes statistical testing against the balance of reshuffled networks complicated. The measures $$D(G)$$, $$C(G)$$ and $$W(G)$$ also have shown some unexpected behaviours on various types of graphs discussed in Sections 5 and 6. 8. Discussion on methodological findings Taken together, the findings in Sections 5–7 give strong reason not to use cycle-based measures $$D(G)$$ and $$C(G)$$, regardless of the weights. The major issues with cycle-based measures $$D(G)$$ and $$C(G)$$ include the very small variance in randomly generated and reshuffled graphs, lack of sensitivity and clustering of values around 0.5 for graphs with a non-trivial number of negative edges. Recall the numerical analysis of synthetic data in Section 5, analytical results on the expected values of cycle-based measures in Section 6.4, and the numerical values which are difficult to interpret like the oscillation of $$D(G)$$ and values of $$C(G)$$ for $${K}_n^c$$ graphs in Table 2 and Fig. 4. The relative $$k$$-balance which is ultimately from the same family of measures, seems to resolve some, but not all the problems discussed above. However, it fails on several axioms and desirable properties. It is easy to compute $$D_{3}(G)$$ based on closed walks of length 3 [13] and there are recent methods resolving the computational burden of computing $$D_{k}(G)$$ for general $$k$$ [9, 10]. However, $$D_{k>3}(G)$$ cannot be used for cyclic graphs that do not have $$k$$-cycles. Besides, for networks with a large proportion of negative edges, the parity of $$k$$ substantially distorts the values of $$D_{k}(G)$$. Accepting all these shortcomings, one may use $$D_{k}(G)$$ when cycles of a particular length have a meaningful interpretation in the context of study. Walk-based measures like $$W(G)$$ require a more systematic way of weighting to correct for the double-counting of closed walks with repeated edges. The shortcomings of $$W(G)$$ involving the weighting method and contribution of non-simple cycles are also discussed in [10, 18]. Recall that $$W(G) \rightarrow 1$$ in 4-regular graphs when we increase $$n$$ as in our discussion in Section 5.2. Besides, $$W({K}_n^c) \rightarrow 0.5$$ as $$n$$ increases as discussed in Section 6.2. The commonly-observed clustering of values near 0.5 may also present problems. Moreover, the model behind $$W(G)$$ is strange as signs of closed walks do not represent balance or imbalance. For these reasons we do not recommend $$W(G)$$ for future use. The major weakness of the normalized algebraic conflict, $$A(G)$$, seems to be its incapability of evaluating the overall balance in graphs that have more than one connected component. Note that, some of the failures observed for $$A(G)$$ on axioms and desirable properties stem from its dependence on $$\lambda(G)$$ the smallest eigenvalue of the signed Laplacian matrix. $$\lambda(G)$$ might be determined by a component of the graph disconnected from other components and in turn not capturing the overall balance of the graph as a whole. For analysing graphs with just one cyclic connected component, one may use $$A(G)$$ while disregarding the acyclic components. However, if a graph has more than one cyclic connected component, using $$A(G)$$ or $$\lambda(G)$$ is similar to disregarding all but the most balanced connected component in the graph. The three trivial measures, namely $$X(G)$$, $$Y(G)$$ and $$Z(G)$$, fail on various basic axioms and desirable properties in Tables 3 and 4, and also show a lack of sensitivity to the graph, making them inappropriate to be used as measures of balance. Satisfying almost all the axioms and desirable properties, $$F(G)$$ seems to measure something different from what is obtained using all cycles or all $$k$$-cycles, and be worth pursuing in future. Note that, $$L(G)$$ equals the minimum number of unbalanced fundamental cycles [23]; suggesting a connection between the frustration and unbalanced cycles yet to be explored further. We recommend using $$F(G)$$ for graphs up to the largest size and order we have considered in this study. Recently developed optimization models [38] are shown to be capable of computing the frustration index in graphs of up to thousands of nodes and edges. For larger graphs, exact computation of $$L(G)$$ would be time consuming and it can be approximated using a non-zero optimality gap tolerance with the optimization models in [38]. Alternatively, $$A(G)$$ and $$D_k(G)$$ seem to be the other options. Depending on the type of the graph, $$k$$-cycles might not necessarily capture global structural properties. For instance, this would make $$D_3(G)$$ an improper choice for some specific graphs like sparse 4-regular graphs (as in Section 5.2), square grids and sparse graphs with a small number of 3-cycles. Similarly, $$A(G)$$ is not suitable for graphs that have more than one connected component (including many sparse graphs). Notes on previous work In the literature, balance theory is widely used on directed signed graphs. It seems that this approach is questionable in two ways. First, it neglects the fact that many edges in signed digraphs are not reciprocated. Bearing that in mind, investigating balance theory in signed digraphs deals with conflict avoidance when one actor in such a relationship may not necessarily be aware of good will or ill will on the part of other actors. This would make studying balance in directed networks analogous to studying how people avoid potential conflict resulting from potentially unknown ties. Secondly, balance theory does not make use of the directionality of ties and the concepts of sending and receiving positive and negative links. Leskovec et al. compare the reliability of predictions made by competing theories of social structure: balance theory and status theory (a theory that explicitly includes direction and gives quite different predictions) [31]. The consistency of these theories with observations is investigated through large signed directed networks such as Epinions, Slashdot and Wikipedia. The results suggest that status theory predicts accurately most of the time while predictions made by balance theory are incorrect half of the time. This supports the inefficacy of balance theory for structural analysis of signed digraphs. For another comparison of the theories on signed networks, one may refer to a study of eight theories to explain signed tie formation between students [32]. In a parallel line of research on network structural analysis, researchers differentiate between classical balance theory and structural balance specifically in the way that the latter is directional [14]. They consider another setting for defining balance where absence of ties implies negative relationships. This assumption makes the theory limited to complete signed digraphs. Accordingly, 64 possible structural configurations emerge for three nodes. These configurations can be reduced to 16 classes of triads, referred to as 16 MAN triad census, based on the number of Mutual, Asymmetric and Null relationships they contain. There are only 2 out of 16 classes that are considered balanced. New definitions are suggested by researchers in order to make balance theory work in a directional context. According to Prell [39], there is a second, a third and a fourth definition of permissible triads allowing for 3, 7 and 9 classes of all 16 MAN triads. However, there have been many instances of findings in conflict with expectations [39]. Apart from directionality, the interpretation of balance measures is very important. Numerous studies have compared balance measures with their extremal values and found that signed networks are far from balanced, for example [16]. However, with such a strict criterion, we must be careful not to look for properties that are almost impossible to satisfy. A much more systematic approach is to compare values of partial balance in the signed graphs in question to the corresponding values for reshuffled graphs [33, 34] as we have done in Section 9. So far we formalized the notion of partial balance and compared various measures of balance based on their values in different graphs where the underlying structure was not important. We also evaluated the measures based on their axiomatic properties and ruled out the measures that we could not justify. In the next section, we focus on exploring real signed graphs based on the justified methods. 9. Results on real signed networks In this section, we analyse partial balance for a range of signed networks inferred from data sets of positive and negative interactions and preferences. Read’s data set for New Guinean highland tribes [40] is demonstrated as a signed graph (G1) in Fig. 5(a), where dotted lines represent negative edges and solid lines represent positive edges. The fourth time window of Sampson’s data set for monastery interactions [41] (G2) is drawn in Fig. 5(b). There are also data sets of students’ choice and rejection (G3 and G4) [42, 43] as demonstrated in Fig. 5(c) and (d). The last three are converted to undirected signed graphs by considering mutually agreed relations. A further explanation on the details of inferring signed graphs from the choice and rejection data is provided in Appendix C. Fig. 5. View largeDownload slide Four small signed networks visualized where dotted lines represent negative edges and solid lines represent positive edges (a) Highland tribes network (G1), a signed network of 16 tribes of the Eastern Central Highlands of New Guinea [40] (b) Monastery interactions network (G2) of 18 New England novitiates inferred from the integration of all positive and negative relationships [41] (c) Fraternity preferences network (G3) of 17 boys living in a pseudo-dormitory inferred from ranking data of the last week in [42] (d) College preferences network (G4) of 17 girls at an Eastern college inferred from ranking data of house B in [43]. Fig. 5. View largeDownload slide Four small signed networks visualized where dotted lines represent negative edges and solid lines represent positive edges (a) Highland tribes network (G1), a signed network of 16 tribes of the Eastern Central Highlands of New Guinea [40] (b) Monastery interactions network (G2) of 18 New England novitiates inferred from the integration of all positive and negative relationships [41] (c) Fraternity preferences network (G3) of 17 boys living in a pseudo-dormitory inferred from ranking data of the last week in [42] (d) College preferences network (G4) of 17 girls at an Eastern college inferred from ranking data of house B in [43]. A larger signed network (G5) is inferred by [44] through implementing a stochastic degree sequence model on Fowler’s data on Senate bill co-sponsorship [45]. Besides the signed social network data sets, large scale biological networks can be analysed as signed graphs. There are relatively large signed biological networks analysed by [46] and [23] from a balance viewpoint under a different terminology where monotonocity is the equivalence for balance. The two gene regulatory networks we consider are related to two organisms: a eukaryote (the yeast Saccharomyces cerevisiae) and a bacterium (Escherichia coli). Graph G6 represents the gene regulatory network of S. cerevisiae [47]. Graph G7 demonstrates the gene regulatory network of the E. coli [48]. Note that, the densities of these networks are much smaller than the other networks introduced above. In gene regulatory networks, nodes represent genes. Positive and negative edges represent activating connections and inhibiting connections respectively. Figure 6 shows the bill co-sponsorship network as well as biological signed networks. The colour of edges correspond to the signs on the edges (green for $$+1$$ and red for $$-1$$). For more details on the biological data sets, one may refer to [23]. Fig. 6. View largeDownload slide Three larger signed datasets illustrated as signed graphs in which red lines represent negative edges and green lines represent positive edges. (colour version online) (a) The bill co-sponsorship network (G5) of senators [44] (b) The gene regulatory network (G6) of Saccharomyces cerevisiae [47] (c) The gene regulatory network (G7) of the Escherichia coli [48]. Fig. 6. View largeDownload slide Three larger signed datasets illustrated as signed graphs in which red lines represent negative edges and green lines represent positive edges. (colour version online) (a) The bill co-sponsorship network (G5) of senators [44] (b) The gene regulatory network (G6) of Saccharomyces cerevisiae [47] (c) The gene regulatory network (G7) of the Escherichia coli [48]. The results are shown in Table 5. As Fig. 6 shows, graphs G6 and G7 have more than one connected component. Besides the giant component, there are a number of small components that we discard in order to use $$A(G)$$ and $$\lambda(G)$$. Note that, this procedure does not change $$T(G)$$ and $$L(G)$$ as the small components are all acyclic. The values of $$(n, m, m^-)$$ for giant components of G6 and G7 are $$(664, 1064, 220)$$ and $$(1376, 3150, 1302)$$ respectively. Table 5 Partial balance computer for signed graphs (G1–7) and reshuffled graphs Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Table 5 Partial balance computer for signed graphs (G1–7) and reshuffled graphs Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Although neither of the networks is completely balanced, the small values of $$L(G)$$ suggest that removal of some edges makes the networks completely balanced. Table 5 also provides a comparison of partial balance between different data sets of similar sizes. In this regard, it is essential to know that the choice of measure can make a substantial difference. For instance among G1–G4, under $$T(G)$$, G1 and G3 are respectively the most and the least partially balanced networks. However, if we choose $$A(G)$$ as the measure, G1 and G3 would be the least and the most partially balanced networks respectively. This confirms our previous discussions on how choosing a different measure can substantially change the results and helps to clarify some of the conflicting observations in the literature [6, 17] and [16], as previously discussed in Section 8. In Table 5, the mean and standard deviation of measures for the reshuffled graphs $$(G_r)$$, denoted by $$\text{mean}(\mu(G_r))$$ and $$\text{SD}(\mu(G_r))$$, are also provided for comparison. We implement a very basic statistical analysis as in [33, 34] using $$\text{mean}(\mu(G_r))$$ and $$\text{SD}(\mu(G_r))$$ of 500 reshuffled graphs. Reshuffling the signs on the edges 500 times, we obtain two parameters of balance distribution for the fixed underlying structure. For measures of balance, Z-scores are calculated based on Equation (9.1). \begin{align} \label{eq26} Z=\frac{\mu(G)-\text{mean}(\mu(G_r))}{\text{SD}(\mu(G_r))} \end{align} (9.1) The Z-score shows how far the balance is with regards to balance distribution of the underlying structure. Positive values of Z-score for $$T(G)$$, $$A(G)$$ and $$F(G)$$ can be interpreted as existence of more partial balance than the average random level of balance. It is worth pointing out that the statistical analysis we have implemented is independent of the normalization method used in $$A(G)$$ and $$F(G)$$. The two right columns of 5 provide $$\lambda(G)$$ and $$L(G)$$ alongside their associated Z-scores. The Z-scores show that as measured by the frustration index and algebraic conflict, signed networks G1–G7 exhibit a level of partial (but not total) balance beyond that expected by chance. Based on these two measures, the level of partial balance is high for graphs G1, G2, G5, G6 and G7 while it is low for graph G3 and negligibly low for graph G4. It indicates that most of the real signed networks investigated are relatively consistent with the theory of structural balance. However, the Z-scores obtained based on the triangle index for G6–G7 show totally different results. Note that G6 and G7 are relatively sparse graphs which only have 70 and 1052 triangles. This may explain the difference between Z-scores of $$T(G)$$ and that of other measures. The numerical results using the algebraic conflict and frustration index support previous observations of real-world networks’ closeness to balance [6, 17]. 10. Conclusion and future research In this study, we started by discussing balance in signed networks in Sections 2 and 3 and introduced the notion of partial balance. We discussed different ways to measure partial balance in Section 4 and provided some observations on synthetic data in Sections 5 and 6. After gaining an understanding of the behaviour of different measures, basic axioms and desirable properties were used in Section 7 to rule out the measures that cannot be justified. We have discussed various methodologies and how they have led to conflicting observations in the literature in Section 8. Taking axiomatic properties of the measures into account, using the common cycle based measures denoted by $$D(G)$$ and $$C(G)$$ and the walk-based measure $$W(G)$$ is not recommended. $$D_k(G)$$ and $$A(G)$$ may introduce some problems, but overall using them seems to be more appropriate compared to $$D(G)$$, $$C(G)$$ and $$W(G)$$. The observations on synthetic data taken together with the axiomatic properties, recommend $$F(G)$$ as the best overall measure of partial balance. However, considering the difficulty of computing the exact value of $$L(G)$$ for very large graphs, one may approximate it using a non-zero optimality gap tolerance with the optimization models in [38]. Alternatively, $$A(G)$$ and $$D_k(G)$$ seem to be the other options accepting their potential shortcomings. Using the three measures $$F(G)$$, $$T(G)$$ and $$A(G)$$, each representing a family of measures, we compared balance in real signed graphs and analogous reshuffled graphs having the same structure in Section 9. Table 5 provides this comparison showing that different results are obtained under different measures. Returning to the questions posed at the beginning of this article, it is now possible to state that under the frustration index and algebraic conflict many signed networks exhibit a level of partial (but not total) balance beyond that expected by chance. However, the numerical results in Table 5 show that the level of balance observed using the triangle index can be totally different. One of the more significant findings to emerge from this study is that methods suggested for measuring balance may have different context and may require some justification before being interpreted based on their values. The present study confirms that some measures of partial balance cannot be taken as a reliable static measure to be used for analysing network dynamics. Although a numerical part of the current study is based on signed networks with less than a few thousand nodes, the analytical findings that were not restricted to a particular size suggest the inefficacy of some methods for analysing larger networks as well. One gap in this study is that we avoid using structural balance theory for analysing directed networks, making directed signed networks like Epinions, Slashdot and Wikipedia Elections [10, 16, 31] data sets untested by our approach. However, see our discussion in Section 8. The findings of this study have a number of important implications for future investigation. Although this study focuses on partial balance, the findings may well have a bearing on link prediction and clustering in signed networks [49]. Some other theoretical topics of interest in signed networks are network dynamics [50] and opinion dynamics [51]. Effective methods of signed network structural analysis can contribute to these topics as well. From a practical viewpoint, international relations is a crucial area to implement signed network structural analysis. Having an efficient measure of partial balance in hand, international relations can be investigated in terms of evaluation of partial balance over time for networks of states. Footnotes Edited by: Michele Benzi Acknowledgements We are grateful for the valuable comments from the anonymous reviewers that have improved this article. References 1. Heider F. ( 1944 ) Social perception and phenomenal causality. Psychol. Rev. , 51 , 358 – 378 . Google Scholar CrossRef Search ADS 2. Cartwright D. & Harary F. ( 1956 ) Structural balance: a generalization of Heider’s theory. Psychol. Rev. , 63 , 277 – 293 . Google Scholar CrossRef Search ADS PubMed 3. Harary F. ( 1957 ) Structural duality. Behav. Sci. , 2 , 255 – 265 . Google Scholar CrossRef Search ADS 4. Acharya B. D. ( 1980 ) Spectral criterion for cycle balance in networks. J. Graph Theor. , 4 , 1 – 11 . Google Scholar CrossRef Search ADS 5. Zaslavsky T. ( 2010 ) Balance and clustering in signed graphs. Slides from Lectures at the CR RAO Advanced Institute of Mathematics, Statistics and Computer Science , vol. 26 . India : Univ. of Hyderabad . 6. Kunegis J. ( 2014 ) Applications of structural balance in signed social networks. Preprint arXiv:1402.6865 . 7. Hou Y. P. ( 2004 ) Bounds for the least Laplacian eigenvalue of a signed graph. Acta Math. Sin. , 21 , 955 – 960 . Google Scholar CrossRef Search ADS 8. Harary F. ( 1977 ) Graphing conflict in international relations. The PAPERS of the Peace Science Society , 27 , 1 – 10 . 9. Giscard P.-L. , Kriege N. & Wilson R. C. ( 2016a ) A general purpose algorithm for counting simple cycles and simple paths of any length. Preprint arXiv:1612.05531 . 10. Giscard P.-L. , Rochet P. & Wilson R. C. ( 2016b ) Evaluating balance on social networks from their simple cycles. J. Complex Netw ., to appear. Preprint arXiv:1606.03347 . 11. Norman R. Z. & Roberts F. S. ( 1972 ) A derivation of a measure of relative balance for social structures and a characterization of extensive ratio systems. J. Math. Psychol. , 9 , 66 – 91 . Google Scholar CrossRef Search ADS 12. Birmelé E. , Ferreira R. , Grossi R. , Marino A. , Pisanti N. , Rizzi R. & Sacomoto G. ( 2013 ) Optimal listing of cycles and st-paths in undirected graphs. Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms ( Khanna Sanjeev ed.). New Orleans, Louisiana : SIAM , pp. 1884 – 1896 . 13. Terzi E. & Winkler M. ( 2011 ) A spectral algorithm for computing social balance. International Workshop on Algorithms and Models for the Web-Graph ( Frieze A. Horn P. & Prałat P. eds). Berlin, Heidelberg : Springer , pp. 1 – 13 . 14. Bonacich P. ( 2012 ) Introduction to Mathematical Sociology . Princeton : Princeton University Press . 15. Pelino V. & Maimone F. ( 2012 ) Towards a class of complex networks models for conflict dynamics. Preprint arXiv:1203.1394 . 16. Estrada E. & Benzi M. ( 2014 ) Walk-based measure of balance in signed networks: detecting lack of balance in social networks. Phys. Rev. E , 90 , 1 – 10 . Google Scholar CrossRef Search ADS 17. Facchetti G. , Iacono G. & Altafini C. ( 2011 ) Computing global structural balance in large-scale signed social networks. Proc. Natl. Acad. Sci. , 108 , 20953 – 20958 . Google Scholar CrossRef Search ADS 18. Singh R. & Adhikari B. ( 2017 ) Measuring the balance of signed networks and its application to sign prediction. J. Stat. Mech. Theory Exp. , 6 , 1 – 16 . 19. Kunegis J. , Schmidt S. , Lommatzsch A. , Lerner J. , De Luca E. W. & Albayrak S. ( 2010 ) Spectral analysis of signed graphs for clustering, prediction and visualization. SDM ( Parthasarathy S. Liu B. Goethals B , Pei J. & Kamath C. ), vol. 10 . Columbus, Ohio, USA : SIAM , pp. 559 – 570 . 20. Belardo F. ( 2014 ) Balancedness and the least eigenvalue of Laplacian of signed graphs. Linear Algebra Appl. , 446 , 133 – 147 . Google Scholar CrossRef Search ADS 21. Belardo F. & Zhou Y. ( 2016 ) Signed Graphs with extremal least Laplacian eigenvalue. Linear Algebra Appl. , 497 , 167 – 180 . Google Scholar CrossRef Search ADS 22. Harary F. ( 1959 ) On the measurement of structural balance. Behav. Sci. , 4 , 316 – 323 . Google Scholar CrossRef Search ADS 23. Iacono G. , Ramezani F. , Soranzo N. & Altafini C. ( 2010 ) Determining the distance to monotonicity of a biological network: a graph-theoretical approach. Syst. Biol., IET , 4 , 223 – 235 . Google Scholar CrossRef Search ADS 24. Garey M. R. & Johnson D. S. ( 2002 ) Computers and intractability. vol. 29 , a guide to the theory of NP-completeness . 25. El Maftouhi A. , Manoussakis Y. & Megalakaki O. ( 2012 ) Balance in random signed graphs. Internet Math. , 8 , 364 – 380 . Google Scholar CrossRef Search ADS 26. Tomescu I. ( 1973 ) Note sur une caractérisation des graphes dont le degré de déséquilibre est maximal. Math. Sci. Hum. , 42 , 37 – 40 . 27. Akiyama J. , Avis D. , Chvátal V. & Era H. ( 1981 ) Balancing signed graphs. Discrete Appl. Math. , 3 , 227 – 233 . Google Scholar CrossRef Search ADS 28. Doreian P. ( 2005 ) Generalized Blockmodeling: Structural Analysis in the Social Sciences , vol. 25 . Cambridge, New York : Cambridge University Press , 2005 . 29. Doreian P. & Mrvar A. ( 2009 ) Partitioning signed social networks. Soc. Netw. , 31 , 1 – 11 . Google Scholar CrossRef Search ADS 30. Doreian P. ( 2015 ) Structural Balance and Signed International Relations. J. Soc. Struct. , 16 , 1 – 49 . 31. Leskovec J. , Huttenlocher D. & Kleinberg J. ( 2010 ) Signed networks in social media. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems ( Mynatt E. D. Schoner D. Fitzpatrick G. Hudson S. E. Edwards W. K. & Rodden T. eds). Atlanta, Georgia, USA : ACM , pp. 1361 – 1370 . 32. Yap J. & Harrigan N. ( 2015 ) Why does everybody hate me? Balance, status, and homophily: the triumvirate of signed tie formation. Soc. Netw. , 40 , 103 – 122 . Google Scholar CrossRef Search ADS 33. Szell M. , Lambiotte R. & Thurner S. ( 2010 ) Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. , 107 , 13636 – 13641 . Google Scholar CrossRef Search ADS 34. Szell M. & Thurner S. ( 2010 ) Measuring social dynamics in a massive multiplayer online game. Soc. Netw. , 32 , 313 – 329 . Google Scholar CrossRef Search ADS 35. Cai Q. , Gong M. , Ma L. , Wang S. , Jiao L. & Du H. ( 2015 ) A particle swarm optimization approach for handling network social balance problem. IEEE Congress on Evolutionary Computation (CEC), 2015 ( Obayashi S. Poloni C. & Murata T. eds). Sendai, Japan : IEEE , pp. 3186 – 3191 . 36. Ma L. , Gong M. , Du H. , Shen B. & Jiao L. ( 2015 ) A memetic algorithm for computing and transforming structural balance in signed networks. Knowledge Based Syst. , 85 , 196 – 209 . Google Scholar CrossRef Search ADS 37. Schwartz T. ( 2010 ) The friend of my enemy is my enemy, the enemy of my enemy is my friend: axioms for structural balance and bi-polarity. Math. Soc. Sci. , 60 , 39 – 45 . Google Scholar CrossRef Search ADS 38. Aref S. , Mason A. J. & Wilson M. C. ( 2016 ) An exact method for computing the frustration index in signed networks using binary programming. Preprint arXiv:1611.09030 . 39. Prell C. ( 2012 ) Social Network Analysis: History, Theory and Methodology . London, Los Angeles : SAGE , 2012 . 40. Read K. E. ( 1954 ) Cultures of the central highlands, New Guinea. Southwestern J. Anthropol. , 10 , 1 – 43 . Google Scholar CrossRef Search ADS 41. Sampson S. F. ( 1968 ) A novitiate in a period of change. An experimental and case study of social relationships (PhD thesis). Cornell University, Ithaca . 42. Newcomb T. M. ( 1961 ) The Acquaintance Process. New York: Holt, Rinehart and Winston. The General Nature of Peer Group Influence, pp. 2-16 in College Peer Groups ( Newcomb T. M. & Wilson E. K. eds). Chicago : Aldine Publishing Co . 43. Lemann T. B. & Solomon R. L. ( 1952 ) Group characteristics as revealed in sociometric patterns and personality ratings. Sociometry , 15 , 7 – 90 . Google Scholar CrossRef Search ADS 44. Neal Z. ( 2014 ) The backbone of bipartite projections: inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Soc. Netw. , 39 , 84 – 97 . Google Scholar CrossRef Search ADS 45. Fowler J. H. ( 2006 ) Legislative cosponsorship networks in the US House and Senate. Soc. Netw. , 28 , 454 – 465 . Google Scholar CrossRef Search ADS 46. DasGupta B. , Enciso Enciso. G. , Sontag E. & Zhang Y. ( 2007 ) Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. Biosystems , 90 , 161 – 178 . Google Scholar CrossRef Search ADS PubMed 47. Costanzo M. C. , Crawford Crawford. M. , Hirschman Hirschman. J. , Kranz Kranz. J. , Olsen P. , Robertson Robertson. L. , Skrzypek Skrzypek. M. , Braun Braun. B. , Hopkins Hopkins. K. , Kondu P. , Lengieza C. , Lew-Smith Lew-Smith. J. , Tillberg M. & Garrels J. I. ( 2001 ) YPD, PombePD and WormPD: model organism volumes of the BioKnowledge Library, an integrated resource for protein information. Nucleic Acids Res. , 29 , 75 – 79 . Google Scholar CrossRef Search ADS PubMed 48. Salgado H. , Gama-Castro S. , Peralta-Gil M. , Diaz-Peredo E. , Sánchez-Solano F. , Santos-Zavaleta A. , Martinez-Flores I. , Jiménez-Jacinto V. , Bonavides-Martinez C. , Segura-Salazar J. et al. ( 2006 ) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. , 34 (suppl. 1) , D394 – D397 . Google Scholar CrossRef Search ADS PubMed 49. Gallier J. ( 2016 ) Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey. Preprint arXiv:1601.04692 . 50. Tan S. & Lü J. ( 2016 ) An evolutionary game approach for determination of the structural conflicts in signed networks. Sci. Rep. , 6 , 22022 . Google Scholar CrossRef Search ADS PubMed 51. Li Y. , Chen W. , Wang Y. & Zhang Z.-L. ( 2015 ) Voter Model on Signed Social Networks. Internet Math. , 11 , 93 – 133 . Google Scholar CrossRef Search ADS 52. Flajolet P. & Sedgewick R. ( 2009 ) Analytic Combinatorics . Cambridge UK : Cambridge University Press . 53. Abelson R. P. & Rosenberg M. J. ( 1958 ) Symbolic psycho-logic: a model of attitudinal cognition. Behav. Sci. , 3 , 1 – 13 . Google Scholar CrossRef Search ADS 54. Doreian P. ( 2008 ) A multiple indicator approach to blockmodeling signed networks. Soc. Netw. , 30 , 247 – 258 . Google Scholar CrossRef Search ADS Appendix A. Details of calculations In order to simplify the sum $$E(D_k(G)) = \sum \limits_{i \: \text{even}} {\binom{k}{i}} q^i (1-q)^{k-i}$$, one may add the two following equations and divide the result by 2: \begin{align} \sum \limits_{i} {\binom{k}{i}} q^i (1-q)^{k-i} &= (q+(1-q))^k \\ \end{align} (A.1) \begin{align} \sum \limits_{i} {\binom{k}{i}} (-q)^i (1-q)^{k-i} &= (-q+(1-q))^k \end{align} (A.2) In $${K}_n^a$$, a $$k$$-cycle is specified by choosing $$k$$ vertices in some order, then correcting for the overcounting by dividing by $$2$$ (the possible directions) and $$k$$ (the number of starting points, namely the length of the cycle). If the unique negative edge is required to belong to the cycle, by orienting this in a fixed way we need choose only $$k-2$$ further elements in order, and no overcounting occurs. The numbers of negative cycles and total cycles are as follows. \begin{align} \sum_{k=3}^n O_k^- =\sum_{k=3}^n\frac{(n-2)!}{(n-k)!} ,\quad \sum_{k=3}^n O_k =\sum_{k=3}^n \frac{n!}{2k(n-k)!} \end{align} (A.3) Asymptotic approximations for these sums can be obtained by introducing the exponential generating function. For example, letting $$a_n = \sum_{1\leq k\leq n} \frac{n!}{(n-k)!k}$$ we have $$\sum_{n\geq 0} \frac{a_n}{n!}x^n = \sum_{n,k} \sum_{k\leq n} \frac{n!}{(n-k)!k} x^n = \sum_{k\geq 1} \frac{1}{k} \sum_{n\geq k} \frac{1}{(n-k)!} x^n = \sum_{k\geq 1} \frac{x^k}{k} \sum_{m\geq 0} \frac{1}{m!}x^m = e^x \log\left(\frac{1}{1-x}\right)\!.$$ Similarly we obtain $$\sum_{n\geq 0} \sum_{k\geq 0} \frac{1}{(n-k)!} x^n = \frac{e^x}{1-x}.$$ Standard singularity analysis methods [52] show the denominator of the expression for $$D(K^a_n)$$ to be asymptotic to $$(n-1)!e/2$$ while the number of negative cycles is asymptotic to $$(n-2)!e$$. Similarly the weighted sum defining $$C(K^a_n)$$, where we choose $$f(k) = 1/k!$$, can be expressed using the ordinary generating function, which for the denominator turns out to be $$\frac{1}{1-x} \log\left( \frac{1-2x}{1-x}\right).$$ Again, singularity analysis techniques yield an approximation $$2^n/n$$. The numerator is easier and asymptotic to $$2^n/(n^2-n)$$. This yields the result. The unsigned adjacency matrix $$|\textbf{A}|$$ of the complete graph has the form $$\textbf{E} - \textbf{I}$$ where $$\textbf{E}$$ is the matrix of all $$1$$’s. The latter matrix has rank 1 and non-zero eigenvalue $$n$$. Thus $$|\textbf{A}| _{({K}_n^a)}$$ has eigenvalues $$n-1$$ (with multiplicity 1) and $$-1$$ (with multiplicity $$n-1$$). The matrix $$\textbf{A} _{({K}_n^a)}$$ has a similar form and we can guess eigenvectors of the form $$(-1,1,0, \dots, 0)$$ and $$(a,a,1,1, \dots , 1)$$. Then $$a$$ satisfies a quadratic $$2a^2 + (n-3)a - (n-2) = 0$$. Solving for $$a$$ and the corresponding eigenvalues, we obtain eigenvalues $$(n-4 \pm \sqrt{(n-2)(n+6)})/2, 1, -1$$ (with multiplicity $$n-3$$)). This yields $$K{({K}_n^a)} = \frac{(n-3)e^{-1} + e + e^{\frac{n-4 - \sqrt{(n-2)(n+6)}}{2}} + e^{\frac{n-4 + \sqrt{(n-2)(n+6)}}{2}}}{(n-1)e^{-1} + e^{n-1}}$$ which results in $$W({K}_n^a) \sim \frac{1+e^{-4/n}}{2}$$. Furthermore, since every node of $$K_n$$ has degree $$n-1$$, the eigenvalues of $$\textbf{L}:=(n-1)\textbf{I}-\textbf{A}$$ are precisely of the form $$n-1 - \lambda$$ where $$\lambda$$ is an eigenvalue of $$\textbf{A}$$. Measures of partial balance for $${K}_n^a$$ can therefore be expressed by the formulae (A.4) – (A.10): \begin{gather}\label{eq15} D({K}_n^a)=1- \frac{\sum_{k=3}^n \frac{(n-2)!}{(n-k)!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!}} \sim 1 - \frac{2}{n}\\ \label{eq15.5} \end{gather} (A.4) \begin{gather} C({K}_n^a)=1- \frac{\sum_{k=3}^n \frac{(n-2)!}{(n-k)!k!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!k!}} \sim 1 - \frac{1}{n}\\ \label{eq17} \end{gather} (A.5) \begin{gather} D_k({K}_n^a)=1-\frac{\frac{(n-2)!}{(n-k)!}}{\frac{n!}{2k(n-k)!}}=1-\frac{2k}{n(n-1)} \sim 1 - \frac{2k}{n^2}\\ \label{eq16} \end{gather} (A.6) \begin{gather} W({K}_n^a) \sim \frac{1+e^{-4/n}}{2} \sim 1-\frac{2}{n}\\ \label{eq17.2} \end{gather} (A.7) \begin{gather} \lambda({K}_n^a)= n-1 - (n-4 + \sqrt{(n-2)(n+6)})/2 = (n+2 - \sqrt{(n-2)(n+6)})/2\\ \label{eq17.4} \end{gather} (A.8) \begin{gather} A({K}_n^a)= 1- \frac{n+2 - \sqrt{(n-2)(n+6)}}{2n-4} \sim 1-\frac{4}{n^2}\\ \label{eq18} \end{gather} (A.9) \begin{gather} F({K}_n^a)=1-\frac{2}{n(n-1)/2}=1-\frac{4}{n(n-1)} \sim 1 - \frac{4}{n^2} \end{gather} (A.10) In $${K}_n^c$$, all cycles of odd length are unbalanced and all cycles of even length are balanced. Therefore: \begin{align} \sum_{k=3}^n O_k^+ = \sum_{\text{even}}^n \frac{n!}{2k(n-k)!} \end{align} (A.11) It follows that $$D_k({K}_n^c)$$ equals 0 for odd $$k$$ and 1 for even $$k$$. Based on maximality of $$\lambda(G)$$ in $${K}_n^c$$, $$A({K}_n^c)=0$$. Using the above generating function techniques we obtain that the numerator of $$D(K^c_n)$$ is asymptotic to $$(n-1)!(e+(-1)^ne^{-1})/4$$. The denominator we know from above is asymptotic to $$(n-1)!e/2$$. This yields $$D(K^c_n) \sim 1/2 + (-1)^n e^{-2}$$. Note that $$e^{-2} \approx 0.135$$ and this explains the oscillation in Fig. 4. Similarly we obtain results for $$C(K^c_n)$$. $$|\textbf{A}|_{({K}_n^c)}$$ has eigenvalues $$n-1$$ (with multiplicity 1) and $$-1$$ (with multiplicity $$n-1$$). The matrix $$\textbf{A}_{({K}_n^c)}$$ has a similar form and the corresponding eigenvalues would be $$1-n$$ (with multiplicity 1) and $$1$$ (with multiplicity $$n-1$$). This yields $$K{({K}_n^c)} = \frac{(n-1)e^{1} + e^{1-n}}{(n-1)e^{-1} + e^{n-1}}$$ which results in $$W({K}_n^c) \sim \frac{1+e^{2-2n}}{2}$$. Moreover, a closed-form formula for $$L({K}_n^c)$$ can be expressed based on a maximum cut which gives a function of $$n$$ equal to an upper bound of the frustration index under a different name in [53]. Measures of partial balance for $${K}_n^c$$ can be expressed via the closed-form formulae as stated in (A.12)–(A.18): \begin{gather}\label{eq21} D({K}_n^c)= \frac{\sum_{\text{$k$ even}}^n \frac{n!}{2k(n-k)!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!}} \sim \frac{1}{2} + (-1)^n e^{-2}\\ \label{eq22} \end{gather} (A.12) \begin{gather} C({K}_n^c)= \frac{\sum_{\text{$k$ even}}^n \frac{n!}{2k(n-k)!k!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!k!}} \sim \frac{1}{2} - \frac{3n\log n}{2^n}\\ \label{eq22.5} \end{gather} (A.13) \begin{gather} D_k({K}_n^c)= \left\{ \begin{array}{ll} 1 & \mbox{if } k \ \text{is even} \\ 0 & \mbox{if } k \ \text{is odd} \end{array} \right.\\ \label{eq23} \end{gather} (A.14) \begin{gather} W({K}_n^c) \sim \frac{1+e^{2-2n}}{2}\\ \end{gather} (A.15) \begin{gather} \label{eq23.3} \lambda({K}_n^c)=\lambda_\text{max}=\overline{d}_{\text{max}}-1=n-2\\ \end{gather} (A.16) \begin{gather} \label{eq24} L({K}_n^c)= \left\{ \begin{array}{ll} ({n^2-2n})/{4} & \mbox{if } n \ \text{is even} \\ ({n^2-2n+1})/{4} & \mbox{if } n \ \text{is odd} \end{array} \right.\\ \end{gather} (A.17) \begin{gather} \label{eq25} F({K}_n^c)= \left\{ \begin{array}{ll} 1-\frac{n(n-2)/4}{n(n-1)/4}=\frac{1}{n-1} & \mbox{if } n \ \text{is even} \\ 1-\frac{(n-1)(n-1)/4}{n(n-1)/4}=\frac{1}{n} & \mbox{if } n \ \text{is odd} \end{array} \right. \end{gather} (A.18) Appendix B. Counterexamples and proof ideas for the axioms and desirable properties Axioms: Axiom 1 holds in all the measures introduced due to the systematic normalization implemented. $$D_k(G)$$, $$A(G)$$ and $$Y(G)$$ do not satisfy Axiom 2. All $$k$$-cycles being balanced, $$D_k(G)$$ fails to detect the imbalance in graphs with unbalanced cycles of different length. $$A(G \oplus C^+)=1$$ for unbalanced graphs which makes $$A(G)$$ fail Axiom 2. $$Y(G)$$ fails on detecting balance in completely bi-polar signed graphs that are indeed balanced. As long as $$\mu(G\oplus H)$$ can be written in the form of $$(a+c)/(b+d)$$ where $$\mu(G)=a/b$$ and $$\mu(H)=c/d$$, $$\mu$$ satisfies Axiom 3. So all the measures considered satisfy Axiom 3, except for $$A(G)$$. In case of $$\lambda(G) < \lambda(H)$$ and $$\lambda_\text{max}(G) < \lambda_\text{max}(H)$$, $$A(G \oplus H)= 1 - \frac{\lambda(G)}{\lambda_\text{max}(H)} > A(H)$$ which shows that $$A(G)$$ fails Axiom 3. The sign of cycles (closed walks), the Laplacian eigenvalues [21], and the frustration index [5] will not change by applying the switching function introduced in (3.1). Therefore, Axiom 4 holds for all the measures discussed except for $$X(G)$$ and $$Y(G)$$ because they depend on $$m^-$$, which changes in switching. Desirable properties: Clearly in B1, $$C^+_3$$ contributes positively to $$D(G)$$ and $$C(G)$$, whereas for $$D_k(G)$$ it depends on $$k$$ which makes it fail B1. As $$W(C^+_3)$$ equals 1, $${\mathrm{Tr}}(e^A)/{\mathrm{Tr}}(e^{|A|})$$ would be added by equal terms in both the numerator and denominator leading to $$W(G)$$ satisfying B1. $$A(G)$$ satisfies B1 because $$A(G\oplus C^+_3)=1$$. As $$m$$ increases by 3, $$F(G)$$ satisfies B1. The dependency of $$X(G)$$ and $$Y(G)$$ on $$m^-$$ and incapability of the binary measure, $$Z(G)$$, in providing values between 0 and 1 make them fail B1. $$C^-_3$$ adds only to the denominators of $$D(G)$$ and $$C(G)$$, whereas for $$D_k(G)$$ it depends on $$k$$ which makes it fail B2. Following the addition of a negative 3-cycle, $$W(G)$$ is observed to increase resulting in its failure in B2 (e.g. take $$G=K_5$$ with single negative edge, and $$C^-_3$$ having a single negative edge). As $$A(G)\neq0$$ and $$A(C^-_3)=0$$, the value of $$A(G \oplus C^-_3)$$ does not change when a negative 3-cycle is added. Therefore, it fails B2. Moreover, $$F(G)$$ fails B2 whenever $$L(G) \geq m/3$$ as observed in a family of graphs in Section 6.2. However, $$F^\prime (G)$$ introduced in Eq. 6.1 which only differs in normalization, satisfies this desirable property. The measures $$X(G)$$ and $$Y(G)$$ fail B2, but the binary measure, $$Z(G)$$, satisfies it. All the cycle-based measures, namely $$D(G),C(G)$$ and $$D_3(G)$$ fail B3 (e.g. take $$G=K_4$$ with two symmetrically located negative edges). $$W(G)$$ is also observed to fail B3 (for instance, take $$G$$ as the disjoint union of a 3-cycle and a 5-cycle each having 1 negative edge). It is known that $$\lambda(G \ominus e) \leq \lambda(G)$$ [21]. However in some cases where $$\lambda_\text{max}(G \ominus e) < \lambda_\text{max} (G)$$ counterexamples are found showing $$A(G)$$ fails on B3 (consider a graph with $$n=8$$, $$m^+=10$$, $$m^-=3$$, $$\min{|E^*|}=3$$ in which $$\lambda_\text{max} (G)=6$$ and $$\lambda_\text{max}(G \ominus e)=3$$). $$Y(G)$$ and $$Z(G)$$ fail B3. Moreover, $$F(G)$$ satisfies B3 because $$L(G \ominus e) = L(G) -1$$. The cycle-based measures and $$W(G)$$ do not satisfy B4. For $$D_3(G)$$, we tested a graph with $$n=7, m=15, |E^*|=3$$ and we observed $$D_3(G \oplus e) > D_3(G)$$. According to Belardo and Zhou, $$\lambda(G \oplus e) \geq \lambda(G)$$ [21]. However in some cases where $$\lambda_\text{max}(G \oplus e) > \lambda_\text{max} (G)$$ counterexamples are found showing $$A(G)$$ fails on B4. counterexamples showing $$D(G),C(G),W(G)$$ and $$A(G)$$ fail B4, are similar to that of B3. Moreover, $$F(G)$$ satisfies B4 as do $$X(G)$$ and $$Z(G)$$, while $$Y(G)$$ fails B4 when $$e$$ is positive. Appendix C. Inferring undirected signed graphs Sampson collected different sociometric rankings from a group of 18 monks at different times [41]. The data provided includes rankings on like, dislike, esteem, disesteem, positive influence, negative influence, praise and blame. We have considered all the positive rankings as well as all the negative ones. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two monks (see [29] and how the authors inferred a directed signed graph in their Table 5 by summing the influence, esteem and respect relations). Newcomb reported rankings made by 17 men living in a pseudo-dormitory [42]. We used the ranking data of the last week which includes complete ranks from 1 to 17 gathered from each man. As the gathered data is related to complete ranking, we considered ranks 1–5 as one-directional positive relations and 12–17 as one-directional negative relations. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two men (see [29] and how the authors converted the top three and bottom three ranks to a directed signed edges in their Fig. 4.). Lemann and Solomon collected ranking data based on multiple criteria from female students living in off-campus dormitories [43]. We used the data for house B which is resulted by integrating top and bottom three one-directional rankings each for multiple criteria. As the gathered data itself is related to top and bottom rankings, we considered all the ranks as one-directional signed relations. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two women (see [54] and how the author inferred a directed signed graph in their Fig. 5 from the data for house B). © The authors 2017. Published by Oxford University Press. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Complex Networks Oxford University Press

# Measuring partial balance in signed networks

Journal of Complex Networks, Volume 6 (4) – Aug 1, 2018
30 pages

/lp/ou_press/measuring-partial-balance-in-signed-networks-WVki4nMECb
Publisher
Oxford University Press
ISSN
2051-1310
eISSN
2051-1329
D.O.I.
10.1093/comnet/cnx044
Publisher site
See Article on Publisher Site

### Abstract

Abstract Is the enemy of an enemy necessarily a friend? If not, to what extent does this tend to hold? Such questions were formulated in terms of signed (social) networks and necessary and sufficient conditions for a network to be ‘balanced’ were obtained around 1960. Since then the idea that signed networks tend over time to become more balanced has been widely used in several application areas. However, investigation of this hypothesis has been complicated by the lack of a standard measure of partial balance, since complete balance is almost never achieved in practice. We formalize the concept of a measure of partial balance, discuss various measures, compare the measures on synthetic datasets and investigate their axiomatic properties. The synthetic data involves Erdős–Rényi and specially structured random graphs. We show that some measures behave better than others in terms of axioms and ability to differentiate between graphs. We also use well-known data sets from the sociology and biology literature, such as Read’s New Guinean tribes, gene regulatory networks related to two organisms, and a network involving senate bill co-sponsorship. Our results show that substantially different levels of partial balance is observed under cycle-based, eigenvalue-based and frustration-based measures. We make some recommendations for measures to be used in future work. 1. Introduction Transitivity of relationships has a pivotal role in analysing social interactions. Is the enemy of an enemy a friend? What about the friend of an enemy or the enemy of a friend? Network science is a key instrument in the quantitative analysis of such questions. Researchers in the field are interested in knowing the extent of transitivity of ties and its impact on the global structure and dynamics in communities with positive and negative relationships. Whether the application involves international relationships among states, friendships and enmities between people, or ties of trust and distrust formed among shareholders, relationship to a third entity tends to be influenced by immediate ties. There is a growing body of literature that aims to connect theories of social structure with network science tools and techniques to study local behaviours and global structures in signed graphs that come up naturally in many unrelated areas. The building block of structural balance is a work by Heider [1] that was expanded into a set of graph-theoretic concepts by Cartwright and Harary [2] to handle a social psychology problem a decade later. The relationship under study has an antonym or dual to be expressed by the opposite sign [3]. In a setting where the opposite of a negative relationship is a positive relationship, a tie to a distant neighbour can be expressed by the product of signs reaching him. Cycles containing an odd number of negative edges are considered to be unbalanced, guaranteeing total balance therefore only in networks containing no such cycles. This strict condition makes it quite unlikely for a signed network to be totally balanced. The literature on signed networks suggests many different formulae to measure balance. These measures are useful for detecting total balance and imbalance, but for intermediate cases their performance is not clear and has not been systematically studied. Our contribution The main focus of this article is to provide insight into measuring partial balance, as much uncertainty still exists on this. The dynamics leading to specific global structures in signed networks remain speculative even after studies with fine-grained approaches. The central thesis of this article is that not all measures are equally useful. We provide a numerical comparison of several measures of partial balance on a variety of undirected signed networks, both randomly generated and inferred from well-known data sets. Using theoretical results for simple classes of graphs, we suggest an axiomatic framework to evaluate these measures and shed light on the methodological details involved in using such measures. This article begins by laying out the theoretical dimensions of the research in Section 2 and looks at basic definitions and terminology. In Section 3, different means of checking for total balance are outlined. Section 4 discusses some approaches to measuring partial balance in Eqs (4.1)–(4.8), categorized into three families of measures 4.1–4.3 and summarized in Table 1. Numerical results on synthetic data are provided in Figs 1 and 2 in Section 5. Section 6 is concerned with analytical results on synthetic data in closed-form formulae in Table 2 and visually represented in Figs 3 and 4. Axioms and desirable properties are suggested in Section 7 to evaluate the measures systematically. Section 8 concerns recommendations for choosing a measure of balance. Numerical results on real signed networks are presented in Section 9. Finally, Section 10 summarizes the study and provides direction for future research. Table 1 Measures of partial balance summarized Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Table 1 Measures of partial balance summarized Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Measure Name, reference and description $$D(G)$$ Degree of balance [2, 22] A cycle-based measure representing the ratio of balanced cycles $$C(G)$$ Weighted degree of balance [11] An extension of $$D(G)$$ using cycles weighted by a non-increasing function of length $$D_k(G)$$ Relative $$k$$-balance [3, 8] An extension of $$D(G)$$ placing a non-zero weight only on cycles of length $$k$$ $$T(G)$$ Triangle index [6, 13] A triangle-based measure representing the ratio of balanced triangles ($$D_3(G)$$) $$W(G)$$ Walk-based measure of balance [15, 16] A simplified extension of $$D(G)$$ replacing cycles by closed walks $$A(G)$$ Normalized algebraic conflict [6, 19] A normalized measure using least eigenvalue of the Laplacian matrix $$F(G)$$ Normalized frustration index [17, 22] Normalized minimum number of edges whose removal results in balance Table 2 Balance in minimally and maximally unbalanced graphs $${K}_n^a$$ (6.1) and $${K}_n^c$$ (6.2) $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ Table 2 Balance in minimally and maximally unbalanced graphs $${K}_n^a$$ (6.1) and $${K}_n^c$$ (6.2) $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ $$\mu(G)$$ $${K}_n^a$$ $${K}_n^c$$ $$D(G)$$ $$\sim 1- {2}/{n}$$ $$\sim \frac{1}{2} + (-1)^n e^{-2}$$ $$C(G)$$ $$\sim 1- {1}/{n}$$ $$\sim \frac{1}{2} - \frac{3n\log n}{2^n}$$ $$D_k(G)$$ $$1-{2k}/{n(n-1)}$$ $$0 , 1$$ $$W(G)$$ $$\sim 1-{2}/{n}$$ $$\sim \frac{1+e^{2-2n}}{2}$$ $$A(G)$$ $$\sim 1-{4}/{n^2}$$ $$0$$ $$F(G)$$ $$1-{4}/{n(n-1)}$$ $$\frac{1}{n},\frac{1}{n-1}$$ Fig. 1. View largeDownload slide Partial balance measured by different methods in Erdős–Rényi network with various number of negative edges (colour version online). (a) The mean values of seven measures of partial balance. (b) The mean values of relative $$k$$-balance $$D_k$$. (c) The standard deviation of $$D$$ and $$C$$. (d) The standard deviation of $$W, T, A$$ and $$F.$$ Fig. 1. View largeDownload slide Partial balance measured by different methods in Erdős–Rényi network with various number of negative edges (colour version online). (a) The mean values of seven measures of partial balance. (b) The mean values of relative $$k$$-balance $$D_k$$. (c) The standard deviation of $$D$$ and $$C$$. (d) The standard deviation of $$W, T, A$$ and $$F.$$ Fig. 2. View largeDownload slide Partial balance measured by different methods in 50% negative 4-regular graphs of different orders $$n$$ and decreasing densities $$4/n-1$$. Fig. 2. View largeDownload slide Partial balance measured by different methods in 50% negative 4-regular graphs of different orders $$n$$ and decreasing densities $$4/n-1$$. Fig. 3. View largeDownload slide Partial balance measured by different methods for $${K}_n^a$$ (6.1). Fig. 3. View largeDownload slide Partial balance measured by different methods for $${K}_n^a$$ (6.1). Fig. 4. View largeDownload slide Partial balance measured by different methods for $${K}_n^c$$ (6.2). Fig. 4. View largeDownload slide Partial balance measured by different methods for $${K}_n^c$$ (6.2). 2. Problem statement and notation Throughout this article, the terms signed graph and signed network will be used interchangeably to refer to a graph with positive and negative edges. We use the term cycle only as a shorthand for referring to simple cycles of the graph. While several definitions of the concept of balance have been suggested, this article will only use the original definition for undirected signed graphs unless explicitly stated. We consider an undirected signed network $$G = (V,E,\sigma),$$ where $$V$$ and $$E$$ are the sets of vertices and edges, and $$\sigma$$ is the sign function $$\sigma: E\rightarrow\{-1,+1\}$$. The set of nodes is denoted by $$V$$, with $$|V| = n$$. The set of edges is represented by $$E$$ including $$m^-$$ negative edges and $$m^+$$ positive edges adding up to a total of $$m=m^+ + m^-$$ edges. The signed adjacency matrix and the unsigned adjacency matrix are defined in (2.1) and (2.2). Entries of A are represented by $$a_{ij}$$. \begin{align}\label{eq1} \textbf{A}{_u}{_v} &= \left\{ \begin{array}{@{}ll} \sigma_{u,v} & \mbox{if } {u,v}\in E \\ 0 & \mbox{if } {u,v}\notin E \end{array} \right.\\ \end{align} (2.1) \begin{align} \label{eq1.1} |\textbf{A}|{_u}{_v} &= \left\{ \begin{array}{@{}ll} 1 & \mbox{if } {u,v}\in E \\ 0 & \mbox{if } {u,v}\notin E \end{array} \right. \end{align} (2.2) The positive degree and negative degree of node $$i$$ are denoted by $$d^+ _{i}$$ and $$d^- _{i}$$ representing the number of positive and negative edges connected to node $$i,$$ respectively. They are calculated based on $$d^+ _{i}=(\sum_j |a_{ij}| + \sum_j a_{ij})/2$$ and $$d^- _{i}=(\sum_j |a_{ij}| - \sum_j a_{ij})/2$$. The degree of node $$i$$ is represented by $$d_i$$ and equals the number of edges connected to node $$i$$. It is calculated based on $$d_{i}= d^+ _{i} + d^- _{i}= \sum_j |a_{ij}|$$. A walk of length $$k$$ in $$G$$ is a sequence of nodes $$v_0,v_1,...,v_{k-1},v_k$$ such that for each $$i=1,2,...,k$$ there is an edge from $$v_{i-1}$$ to $$v_i$$. If $$v_0=v_k$$, the sequence is a closed walk of length $$k$$. If all the nodes in a closed walk are distinct except the endpoints, it is a cycle (simple cycle) of length $$k$$. The sign of a cycle is the product of the signs of its edges. A cycle is balanced if its sign is positive and is unbalanced otherwise. The total number of balanced cycles (closed walks) is denoted by $$O_k ^+$$ ($$Q_k ^+$$). Similarly, $$O_k ^-$$ ($$Q_k ^-$$) denotes the total number of unbalanced cycles (closed walks). The total number of cycles (closed walks) is represented by $$O_k=O_k ^+ + O_k ^-$$ ($$Q_k=Q_k ^+ + Q_k ^-$$). 3. Checking for balance It is essential to have an algorithmic means of checking for balance. We recall several known methods here. The characterization of bi-polarity, that a signed graph is balanced if and only if its vertex set can be partitioned into two subsets such that each negative edge joins vertices belonging to different subsets [2], leads to an obvious breadth-first search procedure similar to the usual algorithm for determining whether a graph is bipartite. An alternate algebraic criterion is that the eigenvalues of the signed and unsigned adjacency matrices are equal if and only if the signed network is balanced [4]. For our purposes the following additional method of detecting balance is also important. We define the switching function$$g(X)$$ operating over a set of vertices $$X\subseteq V$$ as follows: \begin{align} \label{eq1.2} \sigma^{g(X)} _{(u,v)}= \left\{ \begin{array}{ll} \sigma_{u,v} & \mbox{if } {u,v}\in X \ \text{or} \ {u,v}\notin X \\ -\sigma_{u,v} & \mbox{if } u \in X \ \text{and} \ v\notin X \ \text{or} \ u \notin X \ \text{and} \ v \in X. \end{array} \right. \end{align} (3.1) As the sign of cycles remains the same when $$g$$ is applied, any balanced graph can switch to an all-positive signature [5]. Accordingly, a balance detection algorithm can be developed by constructing a switching rule on a spanning tree and a root vertex, as suggested in [5]. Finally, another method of checking for balance in connected signed networks makes use of the signed Laplacian matrix defined by $$\textbf{L}=\textbf{D}-\textbf{A}$$ where $$\textbf{D}_{ii} = \sum_j |a_{ij}|$$ is the diagonal matrix of degrees. The signed Laplacian matrix, $$\textbf{L}$$, is positive-semidefinite, that is all of its eigenvalues are non-negative [6]. The smallest eigenvalue of $$\textbf{L}$$ equals 0 if and only if the graph is balanced [7]. 4. Measures of partial balance Several ways of measuring the extent to which a graph is balanced have been introduced by researchers. We discuss three families of measures here and summarize them in Table 1. 4.1 Measures based on cycles The simplest of such measures is the degree of balance suggested by Cartwright and Harary [2], which is the fraction of balanced cycles: \begin{align}\label{eq1.5} D(G)= \frac {\sum \limits_{k=3}^n O_k ^+ } {\sum \limits_{k=3}^n O_k}. \end{align} (4.1) There are other cycle-based measures closely related to $$D(G)$$. The relative $$k$$-balance, denoted by $$D_k(G)$$ and formulated in (4.2) is a cycle-based measure where the sums defining the numerator and denominator of $$D(G)$$ are restricted to a single term of fixed index $$k$$ [3, 8]. The special case $$k=3$$ is called the triangle index, denoted by $$T(G)$$. \begin{align}\label{eq1.6} D_k(G)= \frac {O_k ^+} {O_k}. \end{align} (4.2) Giscard et al. have recently introduced efficient algorithms for counting simple cycles [9] making it possible to use various measures related to $$D_k(G)$$ to evaluate balance in signed networks [10]. A generalization is weighted degree of balance, obtained by weighting cycles based on length as in (4.3), in which $$f(k)$$ is a monotonically decreasing non-negative function of the length of the cycle. \begin{align}\label{eq1.7} C(G)=\frac{\sum \limits_{k=3}^n f(k) O_k ^+ }{\sum \limits_{k=3}^n f(k) O_k} \end{align} (4.3) The selection of an appropriate weighting function is briefly discussed by Norman and Roberts [11], suggesting functions such as $$1/k,1/k^2,1/2^k$$, but no objective criterion for choosing such a weighting function is known. We consider two weighting functions $$1/k$$ and $$1/k!$$ for evaluating $$C(G)$$ in this article. Given the typical distribution of cycles of different length, the function $$f(k)=1/k$$ provides a rate of decay slower than the typical rate of increase in cycles of length $$k$$, resulting in $$C(G)$$ being dominated by longer cycles. The function $$f(k)=1/k!$$ provides a rate of decay faster than the typical rate of increase in cycles of length $$k$$, resulting in $$C(G)$$ being mostly determined by shorter cycles. Although fast algorithms are developed for counting and listing cycles of undirected graphs [9, 12], the number of cycles grows exponentially with the size of a typical real-world network. To tackle the computational complexity, Terzi and Winkler [13] used $$D_3(G)$$ in their study and replaced the triangles by closed walks of length $$3$$. The triangle index can be calculated efficiently by the formula in (4.4) where $${\mathrm{Tr}}(A)$$ denotes the trace of $$A$$. \begin{align}\label{eq1.8} T(G)= D_3(G) = \frac{O_3 ^+}{O_3} = \frac{{\mathrm{Tr}}(A^3)+{\mathrm{Tr}}(|A|^3)}{2 {\mathrm{Tr}}(|A|^3)} \end{align} (4.4) The relative signed clustering coefficient is suggested as a measure of balance by Kunegis [6], taking insight from the classic clustering coefficient. After normalization, this measure is equal to the triangle index. Having access to an easy-to-compute formula [13] for $$T(G)$$ obviates the need for a clustering-based calculation which requires iterating over all triads in the graph. Bonacich argues that dissonance and tension are unclear in cycles of length greater than three [14], justifying the use of the triangle index to analyse structural balance. However the neglected interactions may represent potential tension and dissonance, though not as strong as that represented by unbalanced triads. One may consider a smaller weight for longer cycles, thereby reducing their impact rather than totally disregarding them. Note that $$C(G)$$ is a generalization of both $$D(G)$$ and $$D_k(G)$$. In all the cycle-based measures, we consider a value of $$1$$ for the case of division by 0. This allows the measures $$D(G)$$ and $$C(G)$$ ($$D_{k}(G)$$) to provide a value for acyclic graphs (graphs with no $$k$$-cycle). 4.2 Measures related to eigenvalues Beside checking cycles, there are computationally easier approaches to structural balance such as the walk-based approach. The walk-based measure of balance is suggested by Pelino and Maimone [15] with more weight placed on shorter closed walks than the longer ones. Let $${\mathrm{Tr}}(e^\textbf{A})$$ and $${\mathrm{Tr}}(e^{|\textbf{A}|})$$ denote the trace of the matrix exponential for $$\textbf{A}$$ and $$|\textbf{A}|$$ respectively. In Equation (4.5), closed walks are weighted by a function with a relatively fast rate of decay compared to functions suggested in [11]. The weighted ratio of balanced to total closed walks is formulated in (4.5). \begin{align}\label{eq2} W(G)= \frac {K(G)+1}{2}, \quad K(G)=\frac{\sum \limits_{k}\frac{Q_k ^+ - Q_k ^-}{k!}}{\sum \limits_{k} \frac{Q_k ^+ + Q_k ^-}{k!}}=\frac{{\mathrm{Tr}}(e^\textbf{A})}{{\mathrm{Tr}}(e^{|\textbf{A}|})}. \end{align} (4.5) Regarding the calculation of $${\mathrm{Tr}}(e^\textbf{A})$$, one may use the standard fact that $$\textbf{A}$$ is a symmetric matrix for undirected graphs. It follows that $${\mathrm{Tr}}(e^\textbf{A})=\sum _{i} e^{\lambda_i}$$ in which $$\lambda_i$$ ranges over eigenvalues of $$\textbf{A}$$. The idea of a walk-based measure was then used by Estrada and Benzi [16]. They have tested their measure on five signed networks resulting in values inclined towards imbalance which were in conflict with some previous observations [6, 17]. The walk-based measure of balance suggested in [16] have been scrutinized in the subsequent studies [10, 18]. Giscard et al. discusses how using closed walks in which the edges might be repeated results in mixing the contribution of various cycle lengths and leads to values that are difficult to interpret [10]. Singh et al. criticize the walk-based measure from another perspective and explains how the inverse factorial weighting distorts the measure towards showing imbalance [18]. The idea of another eigenvalue-based measure comes from spectral graph theory [19]. The smallest eigenvalue of the signed Laplacian matrix provides a measure of balance called algebraic conflict [19]. Algebraic conflict, denoted by $$\lambda(G)$$, equals zero if and only if the graph is balanced. Positive-semidefiniteness of $$\textbf{L}$$ results in $$\lambda(G)$$ representing the amount of imbalance in a signed network. Algebraic conflict is used in [6] to compare the level of balance in online signed networks of different sizes. Moreover, Pelino and Maimone analysed signed network dynamics based on $$\lambda(G)$$ [15]. Bounds for $$\lambda(G)$$ are investigated by [7] leading to recent applicable results in [20, 21]. Belardo and Zhou prove that $$\lambda(G)$$ for a fixed $$n$$ is maximized by the complete all-negative graph of order $$n$$ [21]. Belardo shows that $$\lambda(G)$$ is bounded by $$\lambda_\text{max}(G)=\overline{d}_{\text{max}} -1$$ in which $$\overline{d}_{\text{max}}$$ represents the maximum average degree of endpoints over graph edges [20]. We use this upper bound to normalize algebraic conflict. Normalized algebraic conflict, denoted by $$A(G)$$, is expressed in (4.6). \begin{align}\label{eq3} A(G)=1- \frac {\lambda(G)}{\overline{d}_{\text{max}} -1} , \quad \overline{d}_{\text{max}}=\max_{(u,v)\in E} (d_u+d_v)/2. \end{align} (4.6) 4.3 Measures based on frustration A quite different measure is the frustration index. Originally proposed for applications on ferromagnetic molecules, it is also referred to as the line index for balance by [22]. A set $$E^*$$ of edges is called deletion-minimal if deleting all edges in $$E^*$$ results in a balanced graph, but no proper subset of $$E^*$$ has this property. Each edge in $$E^*$$ lies on an unbalanced cycle and every unbalanced cycle of the network contains an odd number of edges in $$E^*$$ [5]. Iacono et al. showed that $$L(G)$$ equals the minimum number of unbalanced fundamental cycles induced over all spanning trees of the graph [23]. The graph resulted from deleting all edges in $$E^*$$ is called a balanced transformation of a signed graph. The frustration index equals the minimum cardinality among deletion-minimal sets as in (4.7). \begin{align}\label{eq5.0} L(G)=\min \limits_{E^*} {|E^*|.} \end{align} (4.7) Similarly, in a setting where each vertex is given a black or white colour, if the endpoints of positive (negative) edges have different colours (same colour), they are ‘frustrated’. The frustration index is therefore the smallest number of frustrated edges over all possible 2-colourings of the nodes. $$L(G)$$ is hard to compute as the special case with all edges being negative is equivalent to the MAXCUT problem [24], which is known to be NP-hard. There are upper bounds for the frustration index such as $$L(G)\leq m^-$$ which states the obvious result of removing all negative edges. Facchetti et al. have used computational methods related to Ising spin glass models to estimate the frustration index in relatively large online social networks [17]. Using an estimation of the frustration index obtained by a heuristic algorithm, they concluded that the online signed networks are extremely close to total balance; an observation that contradicts some other research studies like [16]. The number of frustrated edges in special Erdős–Rényi graphs is analysed by El Maftouhi et al. [25]. It follows a binomial distribution with parameters $$n(n-1)/2$$ and $$p/2$$ in which $$p$$ represents equal probabilities for positive and negative edges in Erdős–Rényi graph. Therefore, the expected number of frustrated edges is $$n(n-1)p/4$$. They also prove that such a network is almost always not balanced when $$p \geq (\log2)/n$$. It is straightforward to prove that frustration index is equal to the minimum number of negative edges over all switching functions [5]. Moreover, if $$m^- {(G^{g(X)})}=L(G)$$ then every vertex under switching $${g(X)}$$ satisfies $$d^- _{v^{g(X)}} \leq d^+ _{v^{g(X)}}$$. Tomescu [26] proves that the frustration index is bounded by $$\left\lfloor \left(n-1 \right)^2/4 \right\rfloor$$. Bounds for the largest number of frustrated edges for a graph with $$n$$ nodes and $$m$$ edges over all possible 2-colourings of the nodes are provided by [27]. It that $${L(G)} \leq {m}/{2}$$; an upper bound that is not necessarily tight. Another upper bound for the frustration index is reported in [23] referred to as the worst-case upper bound on the consistency deficit. However, the frustration index values in complete graphs with all negative edges shows that the upper bound is incorrect. In order to compare with the other indices which take values in the unit interval and give the value $$1$$ for balanced graphs, we suggest normalized frustration index, denoted by $$F(G)$$ and formulated in (4.8). \begin{align}\label{eq5.1} F(G)=1- \frac {L(G)}{m/2} \end{align} (4.8) Using a different upper bound for normalizing the frustration index, we discuss another frustration-based measure in Section 6.3 and formulate it in Eq. 6.1. 4.4 Other methods of evaluating balance Balance can also be analysed by blockmodelling based on iteratively calculating Pearson moment correlations from the columns of $$\textbf{A}$$ [28]. Blockmodelling reveals increasingly homogeneous sets of vertices. Doreian and Mrvar discuss this approach in partitioning signed networks [29]. Applying the method to Correlates of War data on positive and negative international relationships, they refute the hypothesis that signed networks gradually move towards balance using blockmodelling alongside some variations of $$D(G)$$ and $$L(G)$$ [30]. Moreover, there are probabilistic methods that compare the expected number of balanced and unbalanced triangles in the signed network and its reshuffled version [31–34]. As long as these measures are used to evaluate balance, the result will not be different to what $$T(G)$$ provides alongside a basic statistical testing of its value against reshuffled networks. Some researchers suggest that studying the structural dynamics of signed networks is more important than measuring balance [35, 36]. This approach is usually associated with considering an energy function to be minimized by local graph operations decreasing the energy. However, the energy function is somehow a measure of network imbalance which requires a proper definition and investigation of axiomatic properties. Seven measures of partial balance investigated in this article are outlined in Table 1. Outline of the rest of the article We started by discussing balance in signed networks in Sections 2 and 3 and reviewed different measures in Section 4. We will provide some observations on synthetic data in Figs 1–4 in Sections 5 and 6 to demonstrate the values of different measures. The reader who is not particularly interested in the analysis of measures using synthetic data may directly go to Section 7 in which we introduce axioms and desirable properties for measures of partial balance. In Section 8, we provide some recommendations on choosing a measure and discuss how using unjustified measures has led to conflicting observations in the literature. The numerical results on real-signed networks are presented in Section 9. 5. Numerical results on synthetic data In this section, we start with a brief discussion on the relationship between negative edges and imbalance in networks. According to the definition of structural balance, all-positive signed graphs (merely containing positive edges) are totally balanced. Intuitively, one may expect that all-negative signed graphs are very unbalanced. Perhaps another intuition derived by assuming symmetry is that increasing the number of negative edges in a network reduces partial balance proportionally. We analyse partial balance in randomly generated graphs to show neither of the intuitions is correct. Our motivation for analysing balance in such graphs is to gain an understanding of the behaviour of the measures and their connections with signed graph parameters like $$m^-,n$$ and density. We use randomly generated graphs as the underlying structure is not important for this analysis. 5.1 Erdős–Rényi random network with various $$m^-$$ We calculate measures of partial balance, denoted by $$\mu(G)$$, for an Erdős–Rényi random network with a various number of negative edges. Figure 1 demonstrates the partial balance measured by different methods. For each data point, we report the average of 50 runs, each assigning negative edges at random to the fixed underlying graph. The subfigures (c) and (d) of Fig. 1 show the mean along with $$\pm 1$$ standard deviation. Measures $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ are observed to tend to $$0.5$$ where $$m^- > 5$$, not differentiating partial balance in graphs with a non-trivial number of negative edges. Given the typical distribution of cycles of different length, we expect $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ to be mostly determined by longer cycles that are much more frequent. For this particular graph, cycles with a length of 10 and above account for more than 96% of the total cycles in the graph. Such long cycles tend to be balanced roughly half the time for almost all values of $$m^-$$ (for all the values within the range of $$5 \leq m^- \leq 45$$ in the network considered here). The perfect overlap of data points for $$D(G)$$ and $$C(G)$$ with $$f(k)=1/k$$ in Fig. 1 shows that using a linear rate of decay does not make a difference. One may think that if we use $$D_k(G)$$ which does not mix cycles of different length, it may circumvent the issues. However, subfigure (b) of Fig. 1 demonstrating values of $$D_k(G)$$ for different cycle lengths shows the opposite. It shows not only does $$D_k(G)$$ not resolve the problems of lack of sensitivity and clustering around $$0.5$$, but it behaves unexpectedly with substantially different values based on the parity of $$k$$ when $$m^- > 35$$. $$C(G)$$ weighted by $$f(k)=1/k!$$ and mostly determined by shorter cycles, decreases slower than $$D(G)$$ and then provides values close to $$0.5$$ for $$m^-\geq10$$. $$W(G)$$ drops below $$0.6$$ for $$m^-=10$$ and then clusters around $$0.55$$ for $$m^->10$$. $$T(G)$$ is the measure with a wide range of values symmetric to $$m^-$$. The single most striking observation to emerge is that $$A(G)$$ seems to have a completely different range of values, which we discuss further in Section 6.3. A steady linear decrease is observed from $$F(G)$$ for $$m^-\leq10$$. 5.2 Four-regular random networks of different orders To investigate the impacts of graph order (number of nodes) and density on balance, we computed the measures for randomly generated 4-regular graphs with 50% negative edges. Intuitively, we expect values to have low variation and no trends for similarly structured graphs of different orders. Figure 2 demonstrates the analysis in a setting where the degree of all the nodes remains constant, but the density ($$4/n-1$$) is decreasing in larger graphs. For each data point the average and standard deviation of 100 runs are reported. In each run, negative weights are randomly assigning to half of the edges in a fixed underlying 4-regular graph of order $$n$$. According to Fig. 2, the four measures differ not only in the range of values, but also in their sensitivity to the graph order and density. First, $$W(G) \rightarrow 1$$ when $$n \rightarrow \infty$$ for larger graphs although the graphs are structurally similar, which goes against intuition. Clustered around $$0.5$$ is $$T(G)$$ which features a substantial standard deviation for 4-regular random graphs. Values of $$A(G)$$ are around $$0.8$$ and do not seem to change substantially when $$n$$ increases. $$F(G)$$ provides stationary values around $$0.7$$ when $$n$$ increases. While $$\lambda(G)$$ and $$L(G)$$ depend on the graph order and size, the relative constancy of $$A(G)$$ and $$F(G)$$ values suggest the normalized measures $$A(G)$$ and $$F(G)$$ are largely independent of the graph size and order, as our intuition expects. We further discuss the normalization of $$A(G)$$ and $$F(G)$$ in Section 6.3. 6. Analytical results on synthetic data In this section, we analyse the capability of measuring partial balance in some families of specially structured graphs. Closed-form formulae for the measures in specially structured graphs are provided in Table 2. The two families of complete signed graphs that we investigate are as follows in 6.1 and 6.2. 6.1 Minimally unbalanced complete graphs with a single negative edge The first family includes complete graphs with a single negative edge, denoted by $${K}_n^a$$. Such graphs are only one edge away from a state of total balance. It is straight-forward to provide closed-form formulae for $$\mu({K}_n^a)$$ as expressed in (A.4) – (A.10) in Appendix A. In $${K}_n^a$$, intuitively we expect $$\mu({K}_n^a)$$ to increase with $$n$$ and $$\mu({K}_n^a) \rightarrow 1$$ as $$n \rightarrow \infty$$. We also expect the measure to detect the imbalance in $${K}_3^a$$ (a triangle with one negative edge). Figure 3 demonstrates the behaviour of different indices for complete graphs with one negative edge. $$W({K}_n^a)$$ gives unreasonably large values for $$n < 5$$. Except for $$W({K}_n^a)$$, the measures are co-monotone over the given range of $$n$$. 6.2 Maximally unbalanced complete graphs with all-negative edges The second family of specially structured graphs to analyse includes all-negative complete graphs denoted by $${K}_n^c$$. The indices are calculated in (A.12)–(A.18) in Appendix A. Our calculations for $$L({K}_n^c)$$ show that the upper bound suggested for the frustration index in [23] is incorrect. Intuitively, we expect $$\mu({K}_n^c) \rightarrow 0$$ as $$n \rightarrow \infty$$. Figure 4 illustrates $$D({K}_n^c)$$ oscillating around $$0.5$$ and $$W({K}_n^c),C({K}_n^c) \rightarrow 0.5$$ as $$n$$ increases. We explain the oscillation of $$D({K}_n^c)$$ in Appendix A. Clearly, measures $$D(G)$$, $$C(G)$$ and $$W(G)$$ do no work well for the family of graphs considered here. Figure 4 shows that $$F({K}_n^c) \rightarrow 0$$ as $$n \rightarrow \infty$$ as expected based on Table 2. 6.3 Normalization of the measures It is worth mentioning that measures of partial balance may lead to different maximally unbalanced complete graphs. Based on $$\lambda(G)$$ and $$L(G)$$, $${K}_n^c$$ represents the family of maximally unbalanced graphs, while it is merely one family among the maximally unbalanced graphs according to $$T(G)$$. Estrada and Benzi have found complete graphs comprised of one cycle of $$n$$ positive edges with the remaining pairs of nodes connected by negative edges to be a family of maximally unbalanced graphs based on $$W(G)$$ [16], while this argument is not supported by any other measures. As the signs of cycles in a graph are not independent, the structure of maximally unbalanced graphs under the cycle-based measures, $$D(G)$$ and $$C(G)$$, is not known. A simple comparison of $$L({K}_n^c)$$ (calculations provided in (A.17) in Appendix A) and the proposed upper bound $$m/2=(n^2-n)/4$$ reveals substantial gaps. These gaps equal $$n/4$$ for even $$n$$ and $$(n-1)/4$$ for odd $$n$$. This supports the previous discussions on looseness of $$m/2$$ as an upper bound for frustration index. Assuming $${K}_n^c$$ to be ‘the maximally unbalanced graphs’ under $$L(G)$$, $$\lfloor m/2-(n-1)/4 \rfloor$$ would be a tight upper bound for the frustration index. This allows a modified version of normalized frustration index, denoted by $$F^\prime (G)$$ and defined in (6.1), to take the value zero for $${K}_n^c$$. \begin{align} \label{eq5.8} F^\prime (G)=1-L(G)/ \lfloor m/2-(n-1)/4 \rfloor \end{align} (6.1) Similarly, the upper bound, $$\lambda_\text{max}(G)$$, used to normalize algebraic conflict, is not tight for many graphs. For instance, in the Erdős–Rényi studied in Section 5 with $$m=m^-$$, the existence of an edge with $$\overline{d}_{\text{max}}=9$$ makes $$\lambda_\text{max}(G)=8$$, while $$\lambda(G)=1.98$$. The two observations mentioned above suggest that tighter upper bounds can be used for normalization. However, the statistical analysis we use in Section 9 to evaluate balance in real networks is independent of the normalization method, so we do not pursue this question further now. 6.4 Expected values of the cycle-based measures Relative $$k$$-balance, $$D_k (G)$$, is proved by El Maftouhi et al. [25] to tend to $$0.5$$ for Erdős–Rényi graphs such that the probability of an edge being negative is equal to $$1/2$$. Moreover, Giscard et al. discuss the probability distribution of $$1 - D_k(G)$$. Their discussion is based on a model in which sign of any edge is negative with a fixed probability [10, Section 4.2]. We use the same model to present some simple observations that appear not to have been noticed by previous authors advocating for the use of cycle-based measures. We are going to take a different approach from that of Giscard et al. and merely calculate the expected values of cycle-based measures in general, rather than the full distribution under additional assumptions. Note that, for an arbitrary graph, $${O_k ^+}/ {O_k}$$ gives the probability that a randomly chosen $$k$$-cycle is balanced and is denoted by $$B_{(k,q)}$$. Theorem 6.1 Let $$G$$ be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability $$q$$, and positive otherwise. Then \begin{align} E(D_k(G)) & = (1+{(1-2q)}^k)/2. \label{eq1.9} \end{align} (6.2) Proof. Note that a cycle is balanced if and only if it has an even number of negative edges. Thus $$E\left(\frac {O_k ^+} {O_k}\right) = \sum \limits_{i \: \text{even}} {\binom{k}{i}} q^i (1-q)^{k-i}$$ (compare with [10, Eq. 4.1]). This simplifies to the stated formula (details of calculations are given in Appendix A). □ Note that the expected values are independent of the graph structure and obtaining them does not require making any assumptions on the signs of cycles being independent random variables. As the signs of the edges are independent random variables, the expected value of $$B_{(k,q)}$$ can be obtained by summing on all cases having an even number of negative signs in the $$k$$-cycle. Based on (6.2), $$E(D_k(G)) = 1$$ when $$q = 0$$ and $$E(D_k(G)) = 0.5$$ when $$q = 0.5$$ supporting our intuitive expectations. However, when $$q = 1$$, $$E(D_k(G))$$ takes extremal values based on the parity of $$k$$ which is a major problem as previously observed in the subfigure (b) of Fig. 1. It is clear to see that the parity of $$k$$ makes a substantial difference to $$D_k(G)$$ when a considerable proportion of edges are negative. Theorem 6.2 Let $$G$$ be a graph and consider the sign function obtained by independently choosing each edge to be negative with probability $$q$$, and positive otherwise. Then \begin{align} E(D(G)) & = \frac{1}{2} \frac { \sum \limits_{k=3}^n (1+{(1-2q)}^k)(O_k) } {\sum \limits_{k=3}^n O_k} \label{eq1.10} \end{align} (6.3) Proof. The random variable $${O_k ^+}$$ can be written as $${O_k ^+}= B_{(k,q)}.{O_k}$$. Taking expected value from the two sides gives $$E({O_k ^+})= {O_k}.E(B_{(k,q)})$$ as $$O_k$$ is a constant for a fixed $$k$$. This completes the proof using the result from Theorem 6.1. □ Note that the exponential decay of the factor $$(1-2q)^k$$ reduces the contribution for large $$k$$, and small values of $$k$$ will dominate for many graphs. For example, if $$q=0.2$$ the expression for $$E(D(G))$$ simplifies to $$\frac{1}{2} + \frac{1}{2} \frac{ \sum \limits_{k=3}^n{0.6}^k O_k}{\sum \limits_{k=3}^n O_k}.$$ For many graphs encountered in practice, $$O_k$$ will grow exponentially with $$k$$, but at a rate less than $$1/0.6$$, so the tail contribution will be small. Larger values of $$q$$ only make this effect more pronounced. Thus we expect that $$E(D(G))$$ will often be very close to $$0.5$$ in signed graphs with a reasonably large fraction of negative edges (we have already seen such a phenomenon in Section 5.1). A similar conclusion can be made for $$C(G)$$. This casts doubt on the usefulness of the measures that mix cycles of different length whether weighted or not. While we have also observed many problems involving values of cycle-based measures on synthetic data in other parts of Sections 5 and 6, we will continue evaluating their axiomatic properties in Section 7 and then summarize the methodological findings in Section 8. 7. Axiomatic framework of evaluation The results in Sections 5 and 6 indicate that the choice of measure substantially affects the values of partial balance. Besides that, the lack of a standard measure calls for a framework of comparing different methods. Two different sets of axioms are suggested in [11], which characterize the measure $$C(G)$$ inside a smaller family (up to the choice of $$f(k)$$). Moreover, the theory of structural balance itself is axiomatized in [37]. However, to our knowledge, axioms for general measures of balance have never been developed. Here we provide the first set of axioms and desirable properties for measures of partial balance, in order to shed light on their characteristics and performance. 7.1 Axioms for measures of partial balance We define a measure of partial balance to be a function $$\mu$$ taking each signed graph to an element of $$[0,1]$$. Worthy of mention is that some of these measures were originally defined as a measure of imbalance (algebraic conflict, frustration index and the original walk-based measure) calibrated at $$0$$ for completely balanced structures, so that some normalization was required, and perhaps our normalization choices can be improved on (see Section 6.3). As the choice of $$m/2$$ as the upper bound for normalizing the line index of balance was somewhat arbitrary, another normalized version of frustration index is defined in (7.1). \begin{align} \label{eq6} X(G)=1-L(G)/{m^-} \end{align} (7.1) Before listing the axioms, we justify the need for an axiomatic evaluation of balance measures. As an attempt to understand the need for axiomatizing measures of balance, we introduce two unsophisticated and trivial measures that come to mind for measuring balance. The fraction of positive edges, denoted by $$Y(G)$$, is defined in (7.2) on the basis that all-positive signed graphs are balanced. Moreover, a binary measure of balance, denoted by $$Z(G)$$, is defined in (7.3). While $$Y(G)$$ and $$Z(G)$$ appear to be irrelevant, there is currently no reason not to use such measures. \begin{gather} \label{eq7} Y(G)=m^{+}/m \quad \\ \label{eq8} \end{gather} (7.2) \begin{gather} Z(G) = \left\{ \begin{array}{ll} 1 & \mbox{if } G \, \mbox{is totally balanced} \\ 0 & \mbox{if } G \, \mbox{is not balanced}. \end{array} \right. \end{gather} (7.3) We consider the following notation for referring to basic operations on signed graphs: $$G^{g(X)}$$ denotes signed graph $$G$$ switched by $$g(X)$$ (switched graph). $$G\oplus H$$ denotes the disjoint union of two signed graphs $$G$$ and $$H$$ (disjoint union). $$G\ominus e$$ denotes $$G$$ with $$e$$ deleted (removing an edge). $$G\ominus E^*$$ denotes $$G$$ with deletion-minimal edges deleted (balanced transformation). $$G\oplus C^+_3$$ denotes the disjoint union of graphs $$G$$ and a positive 3-cycle (adding a balanced 3-cycle). $$G\oplus C^-_3$$ denotes the disjoint union of graphs $$G$$ and a negative 3-cycle (adding an unbalanced 3-cycle). $$e\in E^*$$ denotes an edge in the deletion-minimal set. $$G \ominus E^* \oplus e$$ denotes the balanced transformation of a graph with an edge $$e$$ added to it. We list the following axioms: A1 $$0 \leq \mu(G) \leq 1$$. A2 $$\mu(G) = 1$$ if and only if $$G$$ is balanced. A3 If $$\mu(G) \leq \mu(H)$$, then $$\mu(G) \leq \mu(G\oplus H) \leq \mu(H)$$. A4 $$\mu(G^{g(X)}) = \mu(G)$$. The justifications for such axioms are connected to very basic concepts in balance theory. We consider A1 essential in order to make meaningful comparisons between measures. Introducing the notion of partial balance, we argue that total balance, being the extreme case of partial balance, should be denoted by an extremal value as in A2. In A3, the argument is that overall balance of two disjoint graphs is bounded between their individual balances. This also covers the basic requirement that the disjoint union of two copies of graph $$G$$ must have the same value of partial balance as $$G$$. Switching nodes should not change balance [5] as in A4. Table 3 shows how some measures fail on particular axioms. The results provide important insights into how some of the measures are not suitable for measuring partial balance. A more detailed discussion on the proof ideas and counterexamples related to Table 3 is provided in Appendix B. Table 3 Different measures satisfying (✓) or failing (✗) axioms $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ Table 3 Different measures satisfying (✓) or failing (✗) axioms $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ A1 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ A2 ✓ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ A3 ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ A4 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 7.2 Some other desirable properties We also consider four desirable properties that formalize our expectations of a measure of partial balance. We do not consider the following as axioms in that they are based on adding or removing 3-cycles and edges which may bias the comparison in favour of cycle-based and frustration-based measures. Positive and negative 3-cycles are very commonly used to explain the theory of structural balance which makes B1 and B2 obvious requirements. Removing an edge from a set of edges whose removal results in balance, should not decrease balance as in B3. Finally, adding such an edge should not increase balance as in B4. B1 If $$\mu(G) \neq 1$$, then $$\mu(G\oplus C^+_3) > \mu(G)$$. B2 If $$\mu(G) \neq 0$$, then $$\mu(G\oplus C^-_3) < \mu(G)$$ B3 If $$e\in E^*$$, then $$\mu(G \ominus e) \geq \mu(G)$$. B4 If $$\mu(G)\neq 0$$ and $$\mu(G \ominus E^* \oplus e)\neq 1$$, then $$\mu(G \oplus e) \leq \mu(G)$$. Table 4 shows how some measures fail on particular desirable properties. It is worth mentioning that the evaluation in Tables 3–4 is somewhat independent of parametrization: for each strictly increasing function $$h$$ such that $$h(0)=0$$ and $$h(1)=1$$, the results in Tables 3–4 hold for $$h(\mu(G))$$. Proof ideas and counterexamples related to Table 4 is provided in Appendix B. Table 4 Different measures satisfying (✓) or failing (✗) desirable properties $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ Table 4 Different measures satisfying (✓) or failing (✗) desirable properties $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ $$D(G)$$ $$C(G)$$ $$W(G)$$ $$D_k(G)$$ $$A(G)$$ $$F(G)$$ $$X(G)$$ $$Y(G)$$ $$Z(G)$$ B1 ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✗ ✗ B2 ✓ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ B3 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ B4 ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✓ Another desirable property, which we have not formulated as a formal requirement owing to its vagueness, is that the measure takes on a wide range of values. For example, $$D(G)$$ and $$C(G)$$ tend rapidly to $$0.5$$ as $$n$$ increases which makes their interpretation and possibly comparison with other measures difficult. A possible way to formalize it would be expecting $$\mu(G)$$ to give $$0$$ and $$1$$ on each complete graph of order at least $$3$$, for some assignment of signs of edges. This condition would be satisfied by $$T(G)$$ and $$A(G)$$, as well as $$F^\prime(G)$$. However, $$D(G),C(G)$$ and $$W(G)$$ would not satisfy this condition due to the existence of balanced cycles and closed walks in complete signed graphs of general orders. Moreover, the very small standard deviation of $$D(G)$$, $$C(G)$$ and $$W(G)$$ makes statistical testing against the balance of reshuffled networks complicated. The measures $$D(G)$$, $$C(G)$$ and $$W(G)$$ also have shown some unexpected behaviours on various types of graphs discussed in Sections 5 and 6. 8. Discussion on methodological findings Taken together, the findings in Sections 5–7 give strong reason not to use cycle-based measures $$D(G)$$ and $$C(G)$$, regardless of the weights. The major issues with cycle-based measures $$D(G)$$ and $$C(G)$$ include the very small variance in randomly generated and reshuffled graphs, lack of sensitivity and clustering of values around 0.5 for graphs with a non-trivial number of negative edges. Recall the numerical analysis of synthetic data in Section 5, analytical results on the expected values of cycle-based measures in Section 6.4, and the numerical values which are difficult to interpret like the oscillation of $$D(G)$$ and values of $$C(G)$$ for $${K}_n^c$$ graphs in Table 2 and Fig. 4. The relative $$k$$-balance which is ultimately from the same family of measures, seems to resolve some, but not all the problems discussed above. However, it fails on several axioms and desirable properties. It is easy to compute $$D_{3}(G)$$ based on closed walks of length 3 [13] and there are recent methods resolving the computational burden of computing $$D_{k}(G)$$ for general $$k$$ [9, 10]. However, $$D_{k>3}(G)$$ cannot be used for cyclic graphs that do not have $$k$$-cycles. Besides, for networks with a large proportion of negative edges, the parity of $$k$$ substantially distorts the values of $$D_{k}(G)$$. Accepting all these shortcomings, one may use $$D_{k}(G)$$ when cycles of a particular length have a meaningful interpretation in the context of study. Walk-based measures like $$W(G)$$ require a more systematic way of weighting to correct for the double-counting of closed walks with repeated edges. The shortcomings of $$W(G)$$ involving the weighting method and contribution of non-simple cycles are also discussed in [10, 18]. Recall that $$W(G) \rightarrow 1$$ in 4-regular graphs when we increase $$n$$ as in our discussion in Section 5.2. Besides, $$W({K}_n^c) \rightarrow 0.5$$ as $$n$$ increases as discussed in Section 6.2. The commonly-observed clustering of values near 0.5 may also present problems. Moreover, the model behind $$W(G)$$ is strange as signs of closed walks do not represent balance or imbalance. For these reasons we do not recommend $$W(G)$$ for future use. The major weakness of the normalized algebraic conflict, $$A(G)$$, seems to be its incapability of evaluating the overall balance in graphs that have more than one connected component. Note that, some of the failures observed for $$A(G)$$ on axioms and desirable properties stem from its dependence on $$\lambda(G)$$ the smallest eigenvalue of the signed Laplacian matrix. $$\lambda(G)$$ might be determined by a component of the graph disconnected from other components and in turn not capturing the overall balance of the graph as a whole. For analysing graphs with just one cyclic connected component, one may use $$A(G)$$ while disregarding the acyclic components. However, if a graph has more than one cyclic connected component, using $$A(G)$$ or $$\lambda(G)$$ is similar to disregarding all but the most balanced connected component in the graph. The three trivial measures, namely $$X(G)$$, $$Y(G)$$ and $$Z(G)$$, fail on various basic axioms and desirable properties in Tables 3 and 4, and also show a lack of sensitivity to the graph, making them inappropriate to be used as measures of balance. Satisfying almost all the axioms and desirable properties, $$F(G)$$ seems to measure something different from what is obtained using all cycles or all $$k$$-cycles, and be worth pursuing in future. Note that, $$L(G)$$ equals the minimum number of unbalanced fundamental cycles [23]; suggesting a connection between the frustration and unbalanced cycles yet to be explored further. We recommend using $$F(G)$$ for graphs up to the largest size and order we have considered in this study. Recently developed optimization models [38] are shown to be capable of computing the frustration index in graphs of up to thousands of nodes and edges. For larger graphs, exact computation of $$L(G)$$ would be time consuming and it can be approximated using a non-zero optimality gap tolerance with the optimization models in [38]. Alternatively, $$A(G)$$ and $$D_k(G)$$ seem to be the other options. Depending on the type of the graph, $$k$$-cycles might not necessarily capture global structural properties. For instance, this would make $$D_3(G)$$ an improper choice for some specific graphs like sparse 4-regular graphs (as in Section 5.2), square grids and sparse graphs with a small number of 3-cycles. Similarly, $$A(G)$$ is not suitable for graphs that have more than one connected component (including many sparse graphs). Notes on previous work In the literature, balance theory is widely used on directed signed graphs. It seems that this approach is questionable in two ways. First, it neglects the fact that many edges in signed digraphs are not reciprocated. Bearing that in mind, investigating balance theory in signed digraphs deals with conflict avoidance when one actor in such a relationship may not necessarily be aware of good will or ill will on the part of other actors. This would make studying balance in directed networks analogous to studying how people avoid potential conflict resulting from potentially unknown ties. Secondly, balance theory does not make use of the directionality of ties and the concepts of sending and receiving positive and negative links. Leskovec et al. compare the reliability of predictions made by competing theories of social structure: balance theory and status theory (a theory that explicitly includes direction and gives quite different predictions) [31]. The consistency of these theories with observations is investigated through large signed directed networks such as Epinions, Slashdot and Wikipedia. The results suggest that status theory predicts accurately most of the time while predictions made by balance theory are incorrect half of the time. This supports the inefficacy of balance theory for structural analysis of signed digraphs. For another comparison of the theories on signed networks, one may refer to a study of eight theories to explain signed tie formation between students [32]. In a parallel line of research on network structural analysis, researchers differentiate between classical balance theory and structural balance specifically in the way that the latter is directional [14]. They consider another setting for defining balance where absence of ties implies negative relationships. This assumption makes the theory limited to complete signed digraphs. Accordingly, 64 possible structural configurations emerge for three nodes. These configurations can be reduced to 16 classes of triads, referred to as 16 MAN triad census, based on the number of Mutual, Asymmetric and Null relationships they contain. There are only 2 out of 16 classes that are considered balanced. New definitions are suggested by researchers in order to make balance theory work in a directional context. According to Prell [39], there is a second, a third and a fourth definition of permissible triads allowing for 3, 7 and 9 classes of all 16 MAN triads. However, there have been many instances of findings in conflict with expectations [39]. Apart from directionality, the interpretation of balance measures is very important. Numerous studies have compared balance measures with their extremal values and found that signed networks are far from balanced, for example [16]. However, with such a strict criterion, we must be careful not to look for properties that are almost impossible to satisfy. A much more systematic approach is to compare values of partial balance in the signed graphs in question to the corresponding values for reshuffled graphs [33, 34] as we have done in Section 9. So far we formalized the notion of partial balance and compared various measures of balance based on their values in different graphs where the underlying structure was not important. We also evaluated the measures based on their axiomatic properties and ruled out the measures that we could not justify. In the next section, we focus on exploring real signed graphs based on the justified methods. 9. Results on real signed networks In this section, we analyse partial balance for a range of signed networks inferred from data sets of positive and negative interactions and preferences. Read’s data set for New Guinean highland tribes [40] is demonstrated as a signed graph (G1) in Fig. 5(a), where dotted lines represent negative edges and solid lines represent positive edges. The fourth time window of Sampson’s data set for monastery interactions [41] (G2) is drawn in Fig. 5(b). There are also data sets of students’ choice and rejection (G3 and G4) [42, 43] as demonstrated in Fig. 5(c) and (d). The last three are converted to undirected signed graphs by considering mutually agreed relations. A further explanation on the details of inferring signed graphs from the choice and rejection data is provided in Appendix C. Fig. 5. View largeDownload slide Four small signed networks visualized where dotted lines represent negative edges and solid lines represent positive edges (a) Highland tribes network (G1), a signed network of 16 tribes of the Eastern Central Highlands of New Guinea [40] (b) Monastery interactions network (G2) of 18 New England novitiates inferred from the integration of all positive and negative relationships [41] (c) Fraternity preferences network (G3) of 17 boys living in a pseudo-dormitory inferred from ranking data of the last week in [42] (d) College preferences network (G4) of 17 girls at an Eastern college inferred from ranking data of house B in [43]. Fig. 5. View largeDownload slide Four small signed networks visualized where dotted lines represent negative edges and solid lines represent positive edges (a) Highland tribes network (G1), a signed network of 16 tribes of the Eastern Central Highlands of New Guinea [40] (b) Monastery interactions network (G2) of 18 New England novitiates inferred from the integration of all positive and negative relationships [41] (c) Fraternity preferences network (G3) of 17 boys living in a pseudo-dormitory inferred from ranking data of the last week in [42] (d) College preferences network (G4) of 17 girls at an Eastern college inferred from ranking data of house B in [43]. A larger signed network (G5) is inferred by [44] through implementing a stochastic degree sequence model on Fowler’s data on Senate bill co-sponsorship [45]. Besides the signed social network data sets, large scale biological networks can be analysed as signed graphs. There are relatively large signed biological networks analysed by [46] and [23] from a balance viewpoint under a different terminology where monotonocity is the equivalence for balance. The two gene regulatory networks we consider are related to two organisms: a eukaryote (the yeast Saccharomyces cerevisiae) and a bacterium (Escherichia coli). Graph G6 represents the gene regulatory network of S. cerevisiae [47]. Graph G7 demonstrates the gene regulatory network of the E. coli [48]. Note that, the densities of these networks are much smaller than the other networks introduced above. In gene regulatory networks, nodes represent genes. Positive and negative edges represent activating connections and inhibiting connections respectively. Figure 6 shows the bill co-sponsorship network as well as biological signed networks. The colour of edges correspond to the signs on the edges (green for $$+1$$ and red for $$-1$$). For more details on the biological data sets, one may refer to [23]. Fig. 6. View largeDownload slide Three larger signed datasets illustrated as signed graphs in which red lines represent negative edges and green lines represent positive edges. (colour version online) (a) The bill co-sponsorship network (G5) of senators [44] (b) The gene regulatory network (G6) of Saccharomyces cerevisiae [47] (c) The gene regulatory network (G7) of the Escherichia coli [48]. Fig. 6. View largeDownload slide Three larger signed datasets illustrated as signed graphs in which red lines represent negative edges and green lines represent positive edges. (colour version online) (a) The bill co-sponsorship network (G5) of senators [44] (b) The gene regulatory network (G6) of Saccharomyces cerevisiae [47] (c) The gene regulatory network (G7) of the Escherichia coli [48]. The results are shown in Table 5. As Fig. 6 shows, graphs G6 and G7 have more than one connected component. Besides the giant component, there are a number of small components that we discard in order to use $$A(G)$$ and $$\lambda(G)$$. Note that, this procedure does not change $$T(G)$$ and $$L(G)$$ as the small components are all acyclic. The values of $$(n, m, m^-)$$ for giant components of G6 and G7 are $$(664, 1064, 220)$$ and $$(1376, 3150, 1302)$$ respectively. Table 5 Partial balance computer for signed graphs (G1–7) and reshuffled graphs Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Table 5 Partial balance computer for signed graphs (G1–7) and reshuffled graphs Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Graph: $$(n, m, m^-)$$ Density $$T$$ $$A$$ $$F$$ $$\lambda$$ $$L$$ G1: (16, 58, 29) 0.483 $$\mu(G)$$ 0.87 0.88 0.76 1.04 7 $$\text{Mean}(\mu(G_r))$$ 0.50 0.76 0.49 2.08 14.65 $$\text{SD}(\mu(G_r))$$ 0.06 0.02 0.05 0.20 1.38 Z-score 6.04 5.13 5.54 $$-5.13$$ $$-5.54$$ G2: (18, 49, 12) 0.320 $$\mu(G)$$ 0.86 0.88 0.80 0.75 5 $$\text{Mean}(\mu(G_r))$$ 0.55 0.79 0.60 1.36 9.71 $$\text{SD}(\mu(G_r))$$ 0.09 0.03 0.05 0.18 1.17 Z-score 3.34 3.37 4.03 $$-3.37$$ $$-4.03$$ G3: (17, 40, 17) 0.294 $$\mu(G)$$ 0.78 0.90 0.80 0.50 4 $$\text{Mean}(\mu(G_r))$$ 0.49 0.82 0.62 0.89 7.53 $$\text{SD}(\mu(G_r))$$ 0.11 0.06 0.06 0.30 1.24 Z-score 2.64 1.32 2.85 $$-1.32$$ $$-2.85$$ G4: (17, 36, 16) 0.265 $$\mu(G)$$ 0.79 0.88 0.67 0.71 6 $$\text{Mean}(\mu(G_r))$$ 0.49 0.87 0.64 0.79 6.48 $$\text{SD}(\mu(G_r))$$ 0.14 0.03 0.06 0.17 1.08 Z-score 2.16 0.50 0.45 $$-0.50$$ $$-0.45$$ G5: (100, 2461, 1047) 0.497 $$\mu(G)$$ 0.86 0.87 0.73 8.92 331 $$\text{Mean}(\mu(G_r))$$ 0.50 0.75 0.22 17.46 965.6 $$\text{SD}(\mu(G_r))$$ 0.00 0.00 0.01 0.02 9.08 Z-score 118.5 387.8 69.89 $$-387.8$$ $$-69.89$$ G6: (690, 1080, 220) 0.005 $$\mu(G)$$ 0.54 1.00 0.92 0.02 41 $$\text{Mean}(\mu(G_r))$$ 0.58 1.00 0.77 0.02 124.3 $$\text{SD}(\mu(G_r))$$ 0.07 0.00 0.01 0.00 4.97 Z-score $$-0.48$$ 8.61 16.75 $$-8.61$$ $$-16.75$$ G7: (1461, 3215, 1336) 0.003 $$\mu(G)$$ 0.50 1.00 0.77 0.06 371 $$\text{Mean}(\mu(G_r))$$ 0.50 1.00 0.59 0.06 653.4 $$\text{SD}(\mu(G_r))$$ 0.02 0.00 0.00 0.00 7.71 Z-score $$-0.33$$ 3.11 36.64 $$-3.11$$ $$-36.64$$ Although neither of the networks is completely balanced, the small values of $$L(G)$$ suggest that removal of some edges makes the networks completely balanced. Table 5 also provides a comparison of partial balance between different data sets of similar sizes. In this regard, it is essential to know that the choice of measure can make a substantial difference. For instance among G1–G4, under $$T(G)$$, G1 and G3 are respectively the most and the least partially balanced networks. However, if we choose $$A(G)$$ as the measure, G1 and G3 would be the least and the most partially balanced networks respectively. This confirms our previous discussions on how choosing a different measure can substantially change the results and helps to clarify some of the conflicting observations in the literature [6, 17] and [16], as previously discussed in Section 8. In Table 5, the mean and standard deviation of measures for the reshuffled graphs $$(G_r)$$, denoted by $$\text{mean}(\mu(G_r))$$ and $$\text{SD}(\mu(G_r))$$, are also provided for comparison. We implement a very basic statistical analysis as in [33, 34] using $$\text{mean}(\mu(G_r))$$ and $$\text{SD}(\mu(G_r))$$ of 500 reshuffled graphs. Reshuffling the signs on the edges 500 times, we obtain two parameters of balance distribution for the fixed underlying structure. For measures of balance, Z-scores are calculated based on Equation (9.1). \begin{align} \label{eq26} Z=\frac{\mu(G)-\text{mean}(\mu(G_r))}{\text{SD}(\mu(G_r))} \end{align} (9.1) The Z-score shows how far the balance is with regards to balance distribution of the underlying structure. Positive values of Z-score for $$T(G)$$, $$A(G)$$ and $$F(G)$$ can be interpreted as existence of more partial balance than the average random level of balance. It is worth pointing out that the statistical analysis we have implemented is independent of the normalization method used in $$A(G)$$ and $$F(G)$$. The two right columns of 5 provide $$\lambda(G)$$ and $$L(G)$$ alongside their associated Z-scores. The Z-scores show that as measured by the frustration index and algebraic conflict, signed networks G1–G7 exhibit a level of partial (but not total) balance beyond that expected by chance. Based on these two measures, the level of partial balance is high for graphs G1, G2, G5, G6 and G7 while it is low for graph G3 and negligibly low for graph G4. It indicates that most of the real signed networks investigated are relatively consistent with the theory of structural balance. However, the Z-scores obtained based on the triangle index for G6–G7 show totally different results. Note that G6 and G7 are relatively sparse graphs which only have 70 and 1052 triangles. This may explain the difference between Z-scores of $$T(G)$$ and that of other measures. The numerical results using the algebraic conflict and frustration index support previous observations of real-world networks’ closeness to balance [6, 17]. 10. Conclusion and future research In this study, we started by discussing balance in signed networks in Sections 2 and 3 and introduced the notion of partial balance. We discussed different ways to measure partial balance in Section 4 and provided some observations on synthetic data in Sections 5 and 6. After gaining an understanding of the behaviour of different measures, basic axioms and desirable properties were used in Section 7 to rule out the measures that cannot be justified. We have discussed various methodologies and how they have led to conflicting observations in the literature in Section 8. Taking axiomatic properties of the measures into account, using the common cycle based measures denoted by $$D(G)$$ and $$C(G)$$ and the walk-based measure $$W(G)$$ is not recommended. $$D_k(G)$$ and $$A(G)$$ may introduce some problems, but overall using them seems to be more appropriate compared to $$D(G)$$, $$C(G)$$ and $$W(G)$$. The observations on synthetic data taken together with the axiomatic properties, recommend $$F(G)$$ as the best overall measure of partial balance. However, considering the difficulty of computing the exact value of $$L(G)$$ for very large graphs, one may approximate it using a non-zero optimality gap tolerance with the optimization models in [38]. Alternatively, $$A(G)$$ and $$D_k(G)$$ seem to be the other options accepting their potential shortcomings. Using the three measures $$F(G)$$, $$T(G)$$ and $$A(G)$$, each representing a family of measures, we compared balance in real signed graphs and analogous reshuffled graphs having the same structure in Section 9. Table 5 provides this comparison showing that different results are obtained under different measures. Returning to the questions posed at the beginning of this article, it is now possible to state that under the frustration index and algebraic conflict many signed networks exhibit a level of partial (but not total) balance beyond that expected by chance. However, the numerical results in Table 5 show that the level of balance observed using the triangle index can be totally different. One of the more significant findings to emerge from this study is that methods suggested for measuring balance may have different context and may require some justification before being interpreted based on their values. The present study confirms that some measures of partial balance cannot be taken as a reliable static measure to be used for analysing network dynamics. Although a numerical part of the current study is based on signed networks with less than a few thousand nodes, the analytical findings that were not restricted to a particular size suggest the inefficacy of some methods for analysing larger networks as well. One gap in this study is that we avoid using structural balance theory for analysing directed networks, making directed signed networks like Epinions, Slashdot and Wikipedia Elections [10, 16, 31] data sets untested by our approach. However, see our discussion in Section 8. The findings of this study have a number of important implications for future investigation. Although this study focuses on partial balance, the findings may well have a bearing on link prediction and clustering in signed networks [49]. Some other theoretical topics of interest in signed networks are network dynamics [50] and opinion dynamics [51]. Effective methods of signed network structural analysis can contribute to these topics as well. From a practical viewpoint, international relations is a crucial area to implement signed network structural analysis. Having an efficient measure of partial balance in hand, international relations can be investigated in terms of evaluation of partial balance over time for networks of states. Footnotes Edited by: Michele Benzi Acknowledgements We are grateful for the valuable comments from the anonymous reviewers that have improved this article. References 1. Heider F. ( 1944 ) Social perception and phenomenal causality. Psychol. Rev. , 51 , 358 – 378 . Google Scholar CrossRef Search ADS 2. Cartwright D. & Harary F. ( 1956 ) Structural balance: a generalization of Heider’s theory. Psychol. Rev. , 63 , 277 – 293 . Google Scholar CrossRef Search ADS PubMed 3. Harary F. ( 1957 ) Structural duality. Behav. Sci. , 2 , 255 – 265 . Google Scholar CrossRef Search ADS 4. Acharya B. D. ( 1980 ) Spectral criterion for cycle balance in networks. J. Graph Theor. , 4 , 1 – 11 . Google Scholar CrossRef Search ADS 5. Zaslavsky T. ( 2010 ) Balance and clustering in signed graphs. Slides from Lectures at the CR RAO Advanced Institute of Mathematics, Statistics and Computer Science , vol. 26 . India : Univ. of Hyderabad . 6. Kunegis J. ( 2014 ) Applications of structural balance in signed social networks. Preprint arXiv:1402.6865 . 7. Hou Y. P. ( 2004 ) Bounds for the least Laplacian eigenvalue of a signed graph. Acta Math. Sin. , 21 , 955 – 960 . Google Scholar CrossRef Search ADS 8. Harary F. ( 1977 ) Graphing conflict in international relations. The PAPERS of the Peace Science Society , 27 , 1 – 10 . 9. Giscard P.-L. , Kriege N. & Wilson R. C. ( 2016a ) A general purpose algorithm for counting simple cycles and simple paths of any length. Preprint arXiv:1612.05531 . 10. Giscard P.-L. , Rochet P. & Wilson R. C. ( 2016b ) Evaluating balance on social networks from their simple cycles. J. Complex Netw ., to appear. Preprint arXiv:1606.03347 . 11. Norman R. Z. & Roberts F. S. ( 1972 ) A derivation of a measure of relative balance for social structures and a characterization of extensive ratio systems. J. Math. Psychol. , 9 , 66 – 91 . Google Scholar CrossRef Search ADS 12. Birmelé E. , Ferreira R. , Grossi R. , Marino A. , Pisanti N. , Rizzi R. & Sacomoto G. ( 2013 ) Optimal listing of cycles and st-paths in undirected graphs. Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms ( Khanna Sanjeev ed.). New Orleans, Louisiana : SIAM , pp. 1884 – 1896 . 13. Terzi E. & Winkler M. ( 2011 ) A spectral algorithm for computing social balance. International Workshop on Algorithms and Models for the Web-Graph ( Frieze A. Horn P. & Prałat P. eds). Berlin, Heidelberg : Springer , pp. 1 – 13 . 14. Bonacich P. ( 2012 ) Introduction to Mathematical Sociology . Princeton : Princeton University Press . 15. Pelino V. & Maimone F. ( 2012 ) Towards a class of complex networks models for conflict dynamics. Preprint arXiv:1203.1394 . 16. Estrada E. & Benzi M. ( 2014 ) Walk-based measure of balance in signed networks: detecting lack of balance in social networks. Phys. Rev. E , 90 , 1 – 10 . Google Scholar CrossRef Search ADS 17. Facchetti G. , Iacono G. & Altafini C. ( 2011 ) Computing global structural balance in large-scale signed social networks. Proc. Natl. Acad. Sci. , 108 , 20953 – 20958 . Google Scholar CrossRef Search ADS 18. Singh R. & Adhikari B. ( 2017 ) Measuring the balance of signed networks and its application to sign prediction. J. Stat. Mech. Theory Exp. , 6 , 1 – 16 . 19. Kunegis J. , Schmidt S. , Lommatzsch A. , Lerner J. , De Luca E. W. & Albayrak S. ( 2010 ) Spectral analysis of signed graphs for clustering, prediction and visualization. SDM ( Parthasarathy S. Liu B. Goethals B , Pei J. & Kamath C. ), vol. 10 . Columbus, Ohio, USA : SIAM , pp. 559 – 570 . 20. Belardo F. ( 2014 ) Balancedness and the least eigenvalue of Laplacian of signed graphs. Linear Algebra Appl. , 446 , 133 – 147 . Google Scholar CrossRef Search ADS 21. Belardo F. & Zhou Y. ( 2016 ) Signed Graphs with extremal least Laplacian eigenvalue. Linear Algebra Appl. , 497 , 167 – 180 . Google Scholar CrossRef Search ADS 22. Harary F. ( 1959 ) On the measurement of structural balance. Behav. Sci. , 4 , 316 – 323 . Google Scholar CrossRef Search ADS 23. Iacono G. , Ramezani F. , Soranzo N. & Altafini C. ( 2010 ) Determining the distance to monotonicity of a biological network: a graph-theoretical approach. Syst. Biol., IET , 4 , 223 – 235 . Google Scholar CrossRef Search ADS 24. Garey M. R. & Johnson D. S. ( 2002 ) Computers and intractability. vol. 29 , a guide to the theory of NP-completeness . 25. El Maftouhi A. , Manoussakis Y. & Megalakaki O. ( 2012 ) Balance in random signed graphs. Internet Math. , 8 , 364 – 380 . Google Scholar CrossRef Search ADS 26. Tomescu I. ( 1973 ) Note sur une caractérisation des graphes dont le degré de déséquilibre est maximal. Math. Sci. Hum. , 42 , 37 – 40 . 27. Akiyama J. , Avis D. , Chvátal V. & Era H. ( 1981 ) Balancing signed graphs. Discrete Appl. Math. , 3 , 227 – 233 . Google Scholar CrossRef Search ADS 28. Doreian P. ( 2005 ) Generalized Blockmodeling: Structural Analysis in the Social Sciences , vol. 25 . Cambridge, New York : Cambridge University Press , 2005 . 29. Doreian P. & Mrvar A. ( 2009 ) Partitioning signed social networks. Soc. Netw. , 31 , 1 – 11 . Google Scholar CrossRef Search ADS 30. Doreian P. ( 2015 ) Structural Balance and Signed International Relations. J. Soc. Struct. , 16 , 1 – 49 . 31. Leskovec J. , Huttenlocher D. & Kleinberg J. ( 2010 ) Signed networks in social media. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems ( Mynatt E. D. Schoner D. Fitzpatrick G. Hudson S. E. Edwards W. K. & Rodden T. eds). Atlanta, Georgia, USA : ACM , pp. 1361 – 1370 . 32. Yap J. & Harrigan N. ( 2015 ) Why does everybody hate me? Balance, status, and homophily: the triumvirate of signed tie formation. Soc. Netw. , 40 , 103 – 122 . Google Scholar CrossRef Search ADS 33. Szell M. , Lambiotte R. & Thurner S. ( 2010 ) Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. , 107 , 13636 – 13641 . Google Scholar CrossRef Search ADS 34. Szell M. & Thurner S. ( 2010 ) Measuring social dynamics in a massive multiplayer online game. Soc. Netw. , 32 , 313 – 329 . Google Scholar CrossRef Search ADS 35. Cai Q. , Gong M. , Ma L. , Wang S. , Jiao L. & Du H. ( 2015 ) A particle swarm optimization approach for handling network social balance problem. IEEE Congress on Evolutionary Computation (CEC), 2015 ( Obayashi S. Poloni C. & Murata T. eds). Sendai, Japan : IEEE , pp. 3186 – 3191 . 36. Ma L. , Gong M. , Du H. , Shen B. & Jiao L. ( 2015 ) A memetic algorithm for computing and transforming structural balance in signed networks. Knowledge Based Syst. , 85 , 196 – 209 . Google Scholar CrossRef Search ADS 37. Schwartz T. ( 2010 ) The friend of my enemy is my enemy, the enemy of my enemy is my friend: axioms for structural balance and bi-polarity. Math. Soc. Sci. , 60 , 39 – 45 . Google Scholar CrossRef Search ADS 38. Aref S. , Mason A. J. & Wilson M. C. ( 2016 ) An exact method for computing the frustration index in signed networks using binary programming. Preprint arXiv:1611.09030 . 39. Prell C. ( 2012 ) Social Network Analysis: History, Theory and Methodology . London, Los Angeles : SAGE , 2012 . 40. Read K. E. ( 1954 ) Cultures of the central highlands, New Guinea. Southwestern J. Anthropol. , 10 , 1 – 43 . Google Scholar CrossRef Search ADS 41. Sampson S. F. ( 1968 ) A novitiate in a period of change. An experimental and case study of social relationships (PhD thesis). Cornell University, Ithaca . 42. Newcomb T. M. ( 1961 ) The Acquaintance Process. New York: Holt, Rinehart and Winston. The General Nature of Peer Group Influence, pp. 2-16 in College Peer Groups ( Newcomb T. M. & Wilson E. K. eds). Chicago : Aldine Publishing Co . 43. Lemann T. B. & Solomon R. L. ( 1952 ) Group characteristics as revealed in sociometric patterns and personality ratings. Sociometry , 15 , 7 – 90 . Google Scholar CrossRef Search ADS 44. Neal Z. ( 2014 ) The backbone of bipartite projections: inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Soc. Netw. , 39 , 84 – 97 . Google Scholar CrossRef Search ADS 45. Fowler J. H. ( 2006 ) Legislative cosponsorship networks in the US House and Senate. Soc. Netw. , 28 , 454 – 465 . Google Scholar CrossRef Search ADS 46. DasGupta B. , Enciso Enciso. G. , Sontag E. & Zhang Y. ( 2007 ) Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. Biosystems , 90 , 161 – 178 . Google Scholar CrossRef Search ADS PubMed 47. Costanzo M. C. , Crawford Crawford. M. , Hirschman Hirschman. J. , Kranz Kranz. J. , Olsen P. , Robertson Robertson. L. , Skrzypek Skrzypek. M. , Braun Braun. B. , Hopkins Hopkins. K. , Kondu P. , Lengieza C. , Lew-Smith Lew-Smith. J. , Tillberg M. & Garrels J. I. ( 2001 ) YPD, PombePD and WormPD: model organism volumes of the BioKnowledge Library, an integrated resource for protein information. Nucleic Acids Res. , 29 , 75 – 79 . Google Scholar CrossRef Search ADS PubMed 48. Salgado H. , Gama-Castro S. , Peralta-Gil M. , Diaz-Peredo E. , Sánchez-Solano F. , Santos-Zavaleta A. , Martinez-Flores I. , Jiménez-Jacinto V. , Bonavides-Martinez C. , Segura-Salazar J. et al. ( 2006 ) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. , 34 (suppl. 1) , D394 – D397 . Google Scholar CrossRef Search ADS PubMed 49. Gallier J. ( 2016 ) Spectral theory of unsigned and signed graphs. applications to graph clustering: a survey. Preprint arXiv:1601.04692 . 50. Tan S. & Lü J. ( 2016 ) An evolutionary game approach for determination of the structural conflicts in signed networks. Sci. Rep. , 6 , 22022 . Google Scholar CrossRef Search ADS PubMed 51. Li Y. , Chen W. , Wang Y. & Zhang Z.-L. ( 2015 ) Voter Model on Signed Social Networks. Internet Math. , 11 , 93 – 133 . Google Scholar CrossRef Search ADS 52. Flajolet P. & Sedgewick R. ( 2009 ) Analytic Combinatorics . Cambridge UK : Cambridge University Press . 53. Abelson R. P. & Rosenberg M. J. ( 1958 ) Symbolic psycho-logic: a model of attitudinal cognition. Behav. Sci. , 3 , 1 – 13 . Google Scholar CrossRef Search ADS 54. Doreian P. ( 2008 ) A multiple indicator approach to blockmodeling signed networks. Soc. Netw. , 30 , 247 – 258 . Google Scholar CrossRef Search ADS Appendix A. Details of calculations In order to simplify the sum $$E(D_k(G)) = \sum \limits_{i \: \text{even}} {\binom{k}{i}} q^i (1-q)^{k-i}$$, one may add the two following equations and divide the result by 2: \begin{align} \sum \limits_{i} {\binom{k}{i}} q^i (1-q)^{k-i} &= (q+(1-q))^k \\ \end{align} (A.1) \begin{align} \sum \limits_{i} {\binom{k}{i}} (-q)^i (1-q)^{k-i} &= (-q+(1-q))^k \end{align} (A.2) In $${K}_n^a$$, a $$k$$-cycle is specified by choosing $$k$$ vertices in some order, then correcting for the overcounting by dividing by $$2$$ (the possible directions) and $$k$$ (the number of starting points, namely the length of the cycle). If the unique negative edge is required to belong to the cycle, by orienting this in a fixed way we need choose only $$k-2$$ further elements in order, and no overcounting occurs. The numbers of negative cycles and total cycles are as follows. \begin{align} \sum_{k=3}^n O_k^- =\sum_{k=3}^n\frac{(n-2)!}{(n-k)!} ,\quad \sum_{k=3}^n O_k =\sum_{k=3}^n \frac{n!}{2k(n-k)!} \end{align} (A.3) Asymptotic approximations for these sums can be obtained by introducing the exponential generating function. For example, letting $$a_n = \sum_{1\leq k\leq n} \frac{n!}{(n-k)!k}$$ we have $$\sum_{n\geq 0} \frac{a_n}{n!}x^n = \sum_{n,k} \sum_{k\leq n} \frac{n!}{(n-k)!k} x^n = \sum_{k\geq 1} \frac{1}{k} \sum_{n\geq k} \frac{1}{(n-k)!} x^n = \sum_{k\geq 1} \frac{x^k}{k} \sum_{m\geq 0} \frac{1}{m!}x^m = e^x \log\left(\frac{1}{1-x}\right)\!.$$ Similarly we obtain $$\sum_{n\geq 0} \sum_{k\geq 0} \frac{1}{(n-k)!} x^n = \frac{e^x}{1-x}.$$ Standard singularity analysis methods [52] show the denominator of the expression for $$D(K^a_n)$$ to be asymptotic to $$(n-1)!e/2$$ while the number of negative cycles is asymptotic to $$(n-2)!e$$. Similarly the weighted sum defining $$C(K^a_n)$$, where we choose $$f(k) = 1/k!$$, can be expressed using the ordinary generating function, which for the denominator turns out to be $$\frac{1}{1-x} \log\left( \frac{1-2x}{1-x}\right).$$ Again, singularity analysis techniques yield an approximation $$2^n/n$$. The numerator is easier and asymptotic to $$2^n/(n^2-n)$$. This yields the result. The unsigned adjacency matrix $$|\textbf{A}|$$ of the complete graph has the form $$\textbf{E} - \textbf{I}$$ where $$\textbf{E}$$ is the matrix of all $$1$$’s. The latter matrix has rank 1 and non-zero eigenvalue $$n$$. Thus $$|\textbf{A}| _{({K}_n^a)}$$ has eigenvalues $$n-1$$ (with multiplicity 1) and $$-1$$ (with multiplicity $$n-1$$). The matrix $$\textbf{A} _{({K}_n^a)}$$ has a similar form and we can guess eigenvectors of the form $$(-1,1,0, \dots, 0)$$ and $$(a,a,1,1, \dots , 1)$$. Then $$a$$ satisfies a quadratic $$2a^2 + (n-3)a - (n-2) = 0$$. Solving for $$a$$ and the corresponding eigenvalues, we obtain eigenvalues $$(n-4 \pm \sqrt{(n-2)(n+6)})/2, 1, -1$$ (with multiplicity $$n-3$$)). This yields $$K{({K}_n^a)} = \frac{(n-3)e^{-1} + e + e^{\frac{n-4 - \sqrt{(n-2)(n+6)}}{2}} + e^{\frac{n-4 + \sqrt{(n-2)(n+6)}}{2}}}{(n-1)e^{-1} + e^{n-1}}$$ which results in $$W({K}_n^a) \sim \frac{1+e^{-4/n}}{2}$$. Furthermore, since every node of $$K_n$$ has degree $$n-1$$, the eigenvalues of $$\textbf{L}:=(n-1)\textbf{I}-\textbf{A}$$ are precisely of the form $$n-1 - \lambda$$ where $$\lambda$$ is an eigenvalue of $$\textbf{A}$$. Measures of partial balance for $${K}_n^a$$ can therefore be expressed by the formulae (A.4) – (A.10): \begin{gather}\label{eq15} D({K}_n^a)=1- \frac{\sum_{k=3}^n \frac{(n-2)!}{(n-k)!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!}} \sim 1 - \frac{2}{n}\\ \label{eq15.5} \end{gather} (A.4) \begin{gather} C({K}_n^a)=1- \frac{\sum_{k=3}^n \frac{(n-2)!}{(n-k)!k!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!k!}} \sim 1 - \frac{1}{n}\\ \label{eq17} \end{gather} (A.5) \begin{gather} D_k({K}_n^a)=1-\frac{\frac{(n-2)!}{(n-k)!}}{\frac{n!}{2k(n-k)!}}=1-\frac{2k}{n(n-1)} \sim 1 - \frac{2k}{n^2}\\ \label{eq16} \end{gather} (A.6) \begin{gather} W({K}_n^a) \sim \frac{1+e^{-4/n}}{2} \sim 1-\frac{2}{n}\\ \label{eq17.2} \end{gather} (A.7) \begin{gather} \lambda({K}_n^a)= n-1 - (n-4 + \sqrt{(n-2)(n+6)})/2 = (n+2 - \sqrt{(n-2)(n+6)})/2\\ \label{eq17.4} \end{gather} (A.8) \begin{gather} A({K}_n^a)= 1- \frac{n+2 - \sqrt{(n-2)(n+6)}}{2n-4} \sim 1-\frac{4}{n^2}\\ \label{eq18} \end{gather} (A.9) \begin{gather} F({K}_n^a)=1-\frac{2}{n(n-1)/2}=1-\frac{4}{n(n-1)} \sim 1 - \frac{4}{n^2} \end{gather} (A.10) In $${K}_n^c$$, all cycles of odd length are unbalanced and all cycles of even length are balanced. Therefore: \begin{align} \sum_{k=3}^n O_k^+ = \sum_{\text{even}}^n \frac{n!}{2k(n-k)!} \end{align} (A.11) It follows that $$D_k({K}_n^c)$$ equals 0 for odd $$k$$ and 1 for even $$k$$. Based on maximality of $$\lambda(G)$$ in $${K}_n^c$$, $$A({K}_n^c)=0$$. Using the above generating function techniques we obtain that the numerator of $$D(K^c_n)$$ is asymptotic to $$(n-1)!(e+(-1)^ne^{-1})/4$$. The denominator we know from above is asymptotic to $$(n-1)!e/2$$. This yields $$D(K^c_n) \sim 1/2 + (-1)^n e^{-2}$$. Note that $$e^{-2} \approx 0.135$$ and this explains the oscillation in Fig. 4. Similarly we obtain results for $$C(K^c_n)$$. $$|\textbf{A}|_{({K}_n^c)}$$ has eigenvalues $$n-1$$ (with multiplicity 1) and $$-1$$ (with multiplicity $$n-1$$). The matrix $$\textbf{A}_{({K}_n^c)}$$ has a similar form and the corresponding eigenvalues would be $$1-n$$ (with multiplicity 1) and $$1$$ (with multiplicity $$n-1$$). This yields $$K{({K}_n^c)} = \frac{(n-1)e^{1} + e^{1-n}}{(n-1)e^{-1} + e^{n-1}}$$ which results in $$W({K}_n^c) \sim \frac{1+e^{2-2n}}{2}$$. Moreover, a closed-form formula for $$L({K}_n^c)$$ can be expressed based on a maximum cut which gives a function of $$n$$ equal to an upper bound of the frustration index under a different name in [53]. Measures of partial balance for $${K}_n^c$$ can be expressed via the closed-form formulae as stated in (A.12)–(A.18): \begin{gather}\label{eq21} D({K}_n^c)= \frac{\sum_{\text{$k$ even}}^n \frac{n!}{2k(n-k)!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!}} \sim \frac{1}{2} + (-1)^n e^{-2}\\ \label{eq22} \end{gather} (A.12) \begin{gather} C({K}_n^c)= \frac{\sum_{\text{$k$ even}}^n \frac{n!}{2k(n-k)!k!}}{\sum_{k=3}^n \frac{n!}{2k(n-k)!k!}} \sim \frac{1}{2} - \frac{3n\log n}{2^n}\\ \label{eq22.5} \end{gather} (A.13) \begin{gather} D_k({K}_n^c)= \left\{ \begin{array}{ll} 1 & \mbox{if } k \ \text{is even} \\ 0 & \mbox{if } k \ \text{is odd} \end{array} \right.\\ \label{eq23} \end{gather} (A.14) \begin{gather} W({K}_n^c) \sim \frac{1+e^{2-2n}}{2}\\ \end{gather} (A.15) \begin{gather} \label{eq23.3} \lambda({K}_n^c)=\lambda_\text{max}=\overline{d}_{\text{max}}-1=n-2\\ \end{gather} (A.16) \begin{gather} \label{eq24} L({K}_n^c)= \left\{ \begin{array}{ll} ({n^2-2n})/{4} & \mbox{if } n \ \text{is even} \\ ({n^2-2n+1})/{4} & \mbox{if } n \ \text{is odd} \end{array} \right.\\ \end{gather} (A.17) \begin{gather} \label{eq25} F({K}_n^c)= \left\{ \begin{array}{ll} 1-\frac{n(n-2)/4}{n(n-1)/4}=\frac{1}{n-1} & \mbox{if } n \ \text{is even} \\ 1-\frac{(n-1)(n-1)/4}{n(n-1)/4}=\frac{1}{n} & \mbox{if } n \ \text{is odd} \end{array} \right. \end{gather} (A.18) Appendix B. Counterexamples and proof ideas for the axioms and desirable properties Axioms: Axiom 1 holds in all the measures introduced due to the systematic normalization implemented. $$D_k(G)$$, $$A(G)$$ and $$Y(G)$$ do not satisfy Axiom 2. All $$k$$-cycles being balanced, $$D_k(G)$$ fails to detect the imbalance in graphs with unbalanced cycles of different length. $$A(G \oplus C^+)=1$$ for unbalanced graphs which makes $$A(G)$$ fail Axiom 2. $$Y(G)$$ fails on detecting balance in completely bi-polar signed graphs that are indeed balanced. As long as $$\mu(G\oplus H)$$ can be written in the form of $$(a+c)/(b+d)$$ where $$\mu(G)=a/b$$ and $$\mu(H)=c/d$$, $$\mu$$ satisfies Axiom 3. So all the measures considered satisfy Axiom 3, except for $$A(G)$$. In case of $$\lambda(G) < \lambda(H)$$ and $$\lambda_\text{max}(G) < \lambda_\text{max}(H)$$, $$A(G \oplus H)= 1 - \frac{\lambda(G)}{\lambda_\text{max}(H)} > A(H)$$ which shows that $$A(G)$$ fails Axiom 3. The sign of cycles (closed walks), the Laplacian eigenvalues [21], and the frustration index [5] will not change by applying the switching function introduced in (3.1). Therefore, Axiom 4 holds for all the measures discussed except for $$X(G)$$ and $$Y(G)$$ because they depend on $$m^-$$, which changes in switching. Desirable properties: Clearly in B1, $$C^+_3$$ contributes positively to $$D(G)$$ and $$C(G)$$, whereas for $$D_k(G)$$ it depends on $$k$$ which makes it fail B1. As $$W(C^+_3)$$ equals 1, $${\mathrm{Tr}}(e^A)/{\mathrm{Tr}}(e^{|A|})$$ would be added by equal terms in both the numerator and denominator leading to $$W(G)$$ satisfying B1. $$A(G)$$ satisfies B1 because $$A(G\oplus C^+_3)=1$$. As $$m$$ increases by 3, $$F(G)$$ satisfies B1. The dependency of $$X(G)$$ and $$Y(G)$$ on $$m^-$$ and incapability of the binary measure, $$Z(G)$$, in providing values between 0 and 1 make them fail B1. $$C^-_3$$ adds only to the denominators of $$D(G)$$ and $$C(G)$$, whereas for $$D_k(G)$$ it depends on $$k$$ which makes it fail B2. Following the addition of a negative 3-cycle, $$W(G)$$ is observed to increase resulting in its failure in B2 (e.g. take $$G=K_5$$ with single negative edge, and $$C^-_3$$ having a single negative edge). As $$A(G)\neq0$$ and $$A(C^-_3)=0$$, the value of $$A(G \oplus C^-_3)$$ does not change when a negative 3-cycle is added. Therefore, it fails B2. Moreover, $$F(G)$$ fails B2 whenever $$L(G) \geq m/3$$ as observed in a family of graphs in Section 6.2. However, $$F^\prime (G)$$ introduced in Eq. 6.1 which only differs in normalization, satisfies this desirable property. The measures $$X(G)$$ and $$Y(G)$$ fail B2, but the binary measure, $$Z(G)$$, satisfies it. All the cycle-based measures, namely $$D(G),C(G)$$ and $$D_3(G)$$ fail B3 (e.g. take $$G=K_4$$ with two symmetrically located negative edges). $$W(G)$$ is also observed to fail B3 (for instance, take $$G$$ as the disjoint union of a 3-cycle and a 5-cycle each having 1 negative edge). It is known that $$\lambda(G \ominus e) \leq \lambda(G)$$ [21]. However in some cases where $$\lambda_\text{max}(G \ominus e) < \lambda_\text{max} (G)$$ counterexamples are found showing $$A(G)$$ fails on B3 (consider a graph with $$n=8$$, $$m^+=10$$, $$m^-=3$$, $$\min{|E^*|}=3$$ in which $$\lambda_\text{max} (G)=6$$ and $$\lambda_\text{max}(G \ominus e)=3$$). $$Y(G)$$ and $$Z(G)$$ fail B3. Moreover, $$F(G)$$ satisfies B3 because $$L(G \ominus e) = L(G) -1$$. The cycle-based measures and $$W(G)$$ do not satisfy B4. For $$D_3(G)$$, we tested a graph with $$n=7, m=15, |E^*|=3$$ and we observed $$D_3(G \oplus e) > D_3(G)$$. According to Belardo and Zhou, $$\lambda(G \oplus e) \geq \lambda(G)$$ [21]. However in some cases where $$\lambda_\text{max}(G \oplus e) > \lambda_\text{max} (G)$$ counterexamples are found showing $$A(G)$$ fails on B4. counterexamples showing $$D(G),C(G),W(G)$$ and $$A(G)$$ fail B4, are similar to that of B3. Moreover, $$F(G)$$ satisfies B4 as do $$X(G)$$ and $$Z(G)$$, while $$Y(G)$$ fails B4 when $$e$$ is positive. Appendix C. Inferring undirected signed graphs Sampson collected different sociometric rankings from a group of 18 monks at different times [41]. The data provided includes rankings on like, dislike, esteem, disesteem, positive influence, negative influence, praise and blame. We have considered all the positive rankings as well as all the negative ones. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two monks (see [29] and how the authors inferred a directed signed graph in their Table 5 by summing the influence, esteem and respect relations). Newcomb reported rankings made by 17 men living in a pseudo-dormitory [42]. We used the ranking data of the last week which includes complete ranks from 1 to 17 gathered from each man. As the gathered data is related to complete ranking, we considered ranks 1–5 as one-directional positive relations and 12–17 as one-directional negative relations. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two men (see [29] and how the authors converted the top three and bottom three ranks to a directed signed edges in their Fig. 4.). Lemann and Solomon collected ranking data based on multiple criteria from female students living in off-campus dormitories [43]. We used the data for house B which is resulted by integrating top and bottom three one-directional rankings each for multiple criteria. As the gathered data itself is related to top and bottom rankings, we considered all the ranks as one-directional signed relations. Then only the reciprocated relations with similar signs are considered to infer an undirected signed edge between two women (see [54] and how the author inferred a directed signed graph in their Fig. 5 from the data for house B). © The authors 2017. Published by Oxford University Press. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

### Journal

Journal of Complex NetworksOxford University Press

Published: Aug 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create folders to

Export folders, citations