Add Journal to My Library
Journal of Complex Networks
, Volume Advance Article – May 7, 2018

53 pages

/lp/ou_press/spectral-clustering-algorithms-for-the-detection-of-clusters-in-block-KSixX0ApYb

- Publisher
- Oxford University Press
- Copyright
- © The authors 2018. Published by Oxford University Press. All rights reserved.
- ISSN
- 2051-1310
- eISSN
- 2051-1329
- D.O.I.
- 10.1093/comnet/cny011
- Publisher site
- See Article on Publisher Site

Abstract We propose two spectral algorithms for partitioning nodes in directed graphs respectively with a cyclic and an acyclic pattern of connection between groups of nodes, referred to as blocks. Our methods are based on the computation of extremal eigenvalues of the transition matrix associated to the directed graph. The two algorithms outperform state-of-the-art methods for the detection of node clusters in synthetic block-cyclic or block-acyclic graphs, including methods based on blockmodels, bibliometric symmetrization and random walks. In particular, we demonstrate the ability of our algorithms to focus on the cyclic or the acyclic patterns of connection in directed graphs, even in the presence of edges that perturb these patterns. Our algorithms have the same space complexity as classical spectral clustering algorithms for undirected graphs and their time complexity is also linear in the number of edges in the graph. One of our methods is applied to a trophic network based on predator–prey relationships. It successfully extracts common categories of preys and predators encountered in food chains. The same method is also applied to highlight the hierarchical structure of a worldwide network of autonomous systems depicting business agreements between Internet Service Providers. 1. Introduction The past years have witnessed the emergence of large networks in various disciplines including social science, biology and neuroscience. These networks model pairwise relationships between entities such as predator–prey relationships in trophic networks, friendship in social networks, etc. These structures are usually represented as graphs where pairwise relationships are encoded as edges connecting vertices in the graph. When the relationships between entities are not bidirectional, the resulting graph is directed. Some directed networks in real-world applications have a block-acyclic structure: nodes can be partitioned into groups of nodes such that the connections between groups form an acyclic pattern as depicted in Fig. 1a. Such patterns are encountered in networks that tend to have a hierarchical structure such as trophic networks modelling predator–prey relationships [1] or networks of autonomous systems where edges denote money transfers between Internet Service Providers [2]. On the other hand, one may encounter directed graphs with a block-cyclic structure (Fig. 1b) when the network models a cyclic phenomenon such as the carbon cycle [3]. These two patterns are intimately related as the removal of a few edges from a block-cyclic graph makes it block-acyclic. This relationship is also observed in real-world networks: a graph of predator-prey interactions can be viewed as an acyclic version of the carbon cycle. In this article, we take advantage of this connection between the two types of patterns and formulate two closely related algorithms for the detection of groups of nodes respectively in block-acyclic and block-cyclic graphs in the presence of slight perturbations in the form of edges that do not follow the block-cyclic or the block-acyclic pattern of connections in the graph. Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). The partitioning of nodes in block-acyclic and block-cyclic networks can be viewed as a clustering problem. In graph mining, clustering refers to the task of grouping nodes that are similar in some sense. The resulting groups are called clusters. In the case of directed graphs, the definition of similarity between two nodes may take the directionality of edges incident to these nodes into account. Clustering algorithms taking the directionality of edges into account may be referred to as pattern-based clustering algorithms which extract pattern-based clusters [4]: such methods produce a result in which nodes within the same cluster have similar connections with other clusters. Groups of nodes in block-acyclic and block-cyclic graphs are examples of pattern-based clusters. Several approaches were proposed for the detection of pattern-based clusters in directed graphs [4]. Popular families of methods for the detection of pattern-based clusters are random walk based algorithms, blockmodels and more specifically stochastic blockmodels and bibliometric symmetrization. Random walk based models are usually meant to detect density-based clusters [5], however by defining a two step random walk as suggested in [6] pattern-based clusters such as blocks in block-cyclic graphs can also be detected. But, the success of this method is guaranteed only when the graph is strongly connected and the result is hazardous when the graph is sparse, with a high number of nodes with zero inner or outer degree. Models based on a blockmodelling approach [7] are based on the definition of an image graph representing connections between blocks of nodes in a graph and the block membership is selected so that the corresponding image graph is consistent with the edges of the original graph. However, in existing algorithms the optimization process relies, for instance, on simulated annealing, hence the computational cost is high and there is a risk of falling into a local optimum. Moreover, this method may also fail when the graph is sparse. Clustering algorithms based on stochastic blockmodels detect clusters of nodes that are stochastically equivalent. In particular the method proposed in [8] estimates the block membership of nodes by defining a vertex embedding based on the extraction of singular vectors of the adjacency matrix which turns to be efficient compared to the common methods based on expectation maximization. However, the assumption of stochastic equivalence implies that the degrees of nodes within clusters exhibit a certain regularity as shown further. Hence, this approach may yield poor results in detecting clusters in real-world block-cyclic and block-acyclic networks. A related category of method is bibliometric symmetrization, which defines a node similarity matrix as a weighted sum between the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$ [9] where $$W$$ is the adjacency matrix of the graph. However, it may also fail when the degrees of nodes are not sufficiently regular within groups. To relax this assumption, degree corrected versions with variables representing the degrees of nodes were proposed [10, 11]. But fitting these models relies on costly methods that do not eliminate the risk of falling into a local optimum (simulated annealing, local heuristics, etc.) [12]. Hence methods based on random walks, bibliometric symmetrization, blockmodels with or without degree correction, may yield poor results in the detection of blocks of nodes in block-cyclic and block-acyclic graphs due to assumptions of connectivity or regularity or due to the computational difficulty of solving the associated optimization problems. The methods described in this article partly alleviate these weaknesses. In this article, we present two new clustering algorithms that extract clusters in block-cyclic and block-acyclic graphs in the presence of perturbing edges. The first algorithm, called Block-Cyclic Spectral (BCS) clustering algorithm is designed for the detection of clusters in block-cyclic graphs. The second algorithm, referred to as Block-Acyclic Spectral (BAS) clustering algorithm is a slight extension of the first one that is able to detect clusters in block-acyclic graphs. Two types of perturbation are considered in our study: in the first case, the perturbing edges are uniformly distributed across the graph, while, in the second case, the perturbation is not uniform with some groups of nodes experiencing a higher volume of perturbing edges. A theoretical analysis of the effect of perturbing edges provides a sufficient condition on the perturbation for the success of our algorithms: in particular, when the perturbation is uniform, the condition specifies a maximum number of perturbing edges that can be tolerated by our BCS clustering algorithm to recover the blocks of a block-cyclic graph. Moreover, experimental results show that, even when the perturbation is not uniform, our algorithms are also successful and outperform other approaches. In particular, experiments on synthetic unweighted graphs show that our BCS clustering algorithm is able to recover the blocks perfectly even when appending $$\mathscr{O}(|E|)$$ perturbing edges to a block-cyclic graph containing $$|E|$$ edges.1 We apply the second algorithm to two real-world datasets: a trophic network in which the traditional classification of agents in an ecosystem is detected, from producers to top-level predators, and a worldwide network of autonomous systems depicting money transfers between Internet Service Providers. When tested on synthetic datasets, our algorithms produce smaller clustering errors than other state-of-the-art algorithms. Moreover, our methods only involve standard tools of linear algebra which makes them efficient in terms of time and space complexity. Simple heuristics are also provided in order to automatically determine the number of blocks in block-cyclic and block-acyclic graphs. Hence the approach, we follow differs from other clustering methods for directed graphs: we restrict ourselves to two patterns of connection (cyclic and acyclic) but we make no assumption of regularity (for instance on the degrees of nodes). Moreover, our BCS and BAS clustering algorithms are able to focus on the cyclic or the acyclic patterns of connection respectively in block-cyclic or block-acyclic graphs, while neglecting any other patter present in the graph. Our proposed algorithms are based on the computation of extremal eigenvalues and eigenvectors of a non-symmetric graph-related matrix, commonly called the transition matrix$$P$$. The proposed approaches amount to finding a set of right eigenvectors of $$P$$ verifying \begin{equation} Pu=\lambda u\text{ for }\lambda\in\mathbb{C}\text{, }|\lambda|\simeq 1\text{, }\lambda\neq 1 \end{equation} (1.1) and clustering the entries of these eigenvectors to recover the blocks of nodes. Hence, the process is similar to the well-known Spectral Clustering algorithm for the detection of clusters in undirected graphs, which is also based on the computation of extremal eigenvalues and eigenvectors of a graph-related matrix [13]. However, spectral clustering and extensions of spectral clustering to directed graphs are essentially based on the real spectrum of symmetric matrices associated to the graph [8, 14, 15]. In contrast, our method is based on the complex spectrum of a non-symmetric matrix. Hence, it keeps the intrinsically asymmetric information contained in directed graphs while having approximately the same time and space complexity as other spectral clustering algorithms. A paper recently appeared [16] that exploits spectral information (in a different way than in the present paper) for solving a related problem, the detection of block-cyclic components in the communities of a graph, with a special focus on the three-block case. In contrast, we focus on networks with a global block-cyclic structure and extend our method for the detection of acyclic structures, which we deem even more relevant than block-cyclicity in practical situations. Part of the results presented here were derived in an unpublished technical report [17]. This article offers more empirical validation and comparison with state-of-the-art competing techniques. The structure of this article is as follows. In Section 2, we describe related clustering methods for directed graphs. In Section 3, we present our BCS clustering algorithm for the detection of clusters in block-cyclic graphs. Then we describe the links between block-cyclic and block-acyclic graphs in Section 4. In Section 5, BAS clustering algorithm is introduced for the detection of clusters in block-acyclic graphs. In Section 6, we analyse the performances of BCS and BAS clustering algorithms on synthetic data. Finally, in Section 7, we apply BAS clustering algorithm to a real-world trophic network and a network of autonomous systems. 2. Related work In this section, we present existing algorithms related to our work, including the classical spectral clustering algorithm and some existing algorithms for clustering directed graphs. In subsequent sections, we refer to a weighted directed graph as a triplet $$G=(V,E,W)$$ where $$V$$ is a set of nodes, $$E\subseteq V\times V$$ is a set of directed edges and $$W\in\mathbb{R}^{n\times n}_+$$ is a matrix of positive edge weights such that $$W_{uv}$$ is the weight of edge $$(u,v)$$ and for each $$u,v\in V$$, \begin{equation} W_{uv}>0\leftrightarrow (u,v)\in E. \end{equation} (2.1) When the graph is unweighted, we refer to it as a pair $$G=(V,E)$$ and the binary adjacency matrix is kept implicit. Moreover, when the graph is undirected, we have $$W=W^T$$. Finally, we refer to a right (resp. left) eigenvector of a matrix $$A\in\mathbb{C}^{n\times n}$$ associated to an eigenvalue $$\lambda\in\mathbb{C}$$ as a vector $$u\in\mathbb{C}^n\setminus\{0\}$$ such that $$Au=\lambda u$$ (resp. $$u^HA=\lambda u^H$$). 2.1 Spectral clustering of undirected graphs Spectral clustering uses eigenvalues and eigenvectors of a graph-related matrix (the Laplacian) to detect density-based clusters of nodes in an undirected graph, namely clusters with a high number of intra-cluster edges and a low number of inter-cluster edges [13]. The method can be decomposed into two steps. First, the nodes of the graph are mapped to points in a Euclidean space such that nodes that are likely to lie in the same cluster are mapped to points that are close to each other in this projection space. The second step of the method involves clustering the $$n$$ points in $$\mathbb{R}^k$$ using k-means algorithm. The algorithm is based on spectral properties of the graph Laplacian $$L\in\mathbb{R}^{n\times n}$$ defined by \begin{equation} L_{uv}=\left\lbrace\begin{array}{ll} 1-\frac{W_{uu}}{d_u}&\text{ if }u=v\text{ and }d_u\neq 0\\ -\frac{W_{uv}}{\sqrt{d_ud_v}}&\text{ if }u\text{ and }v\text{ are adjacent}\\ 0&\text{ otherwise} \end{array}\right. \end{equation} (2.2) where $$W$$ is the adjacency matrix of the graph and $$d_u$$ is the degree of node $$u$$. If the target number of clusters is $$k$$, extracting the right eigenvectors associated to the $$k$$ smallest eigenvalues of the Laplacian and storing them as the columns of a matrix $$U\in\mathbb{R}^{n\times k}$$, the embeddings of nodes are given by the $$n$$ rows of $$U$$. One can justify this method in the following way. When clusters are disconnected, namely when the graph contains $$k$$ connected components, the rows of $$U$$ associated to nodes belonging to the same component are identical [13]. Hence, when perturbing this disconnected graph, namely in the presence of clusters with a low number of inter-cluster edges, nodes within the same clusters are mapped to points that tend to form clusters in $$\mathbb{R}^k$$. This is explained by the semi-simplicity of eigenvalue $$0$$ of the graph Laplacian which implies the continuous dependence of associated right eigenvectors on the weights of edges in the graph [13]. In Sections 3 and 5, we show that a similar approach can be used to extract clusters in directed graphs with a cyclic or an acyclic pattern of connection between clusters: we also use the spectrum of a graph-related matrix to map nodes to points in a Euclidean space and cluster these points with k-means algorithm. 2.2 Clustering algorithms for directed graphs In this section, we describe existing clustering algorithms for the detection of pattern-based clusters in directed graphs, namely groups of nodes with similar connections to other groups in some sense. We focus on methods that are theoretically able to extract blocks from block-cyclic and block-acyclic graphs. Bibliometric symmetrization refers to a symmetrization of the adjacency matrix $$W$$ of $$G$$. The symmetrized matrix $$(1-\alpha)W^TW+\alpha WW^T$$ is defined as the adjacency matrix of an undirected graph $$G_u$$ for a certain choice of weighing parameter $$\alpha$$. This symmetric adjacency matrix is a linear combination of the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$. Then clustering methods for undirected graphs are applied to $$G_u$$ such as a spectral clustering algorithm. This method is efficient to detect co-citation networks [9]. The primary focus of random walk based clustering algorithms is the detection of density-based clusters [5], namely with a high number of intra-cluster edges and a low-number of inter-cluster edges. A symmetric Laplacian matrix for directed graphs based on the stationary probability distribution of a random walk is defined and applying classical spectral clustering algorithm to this Laplacian matrix leads to the extraction of clusters in which a random walker is likely to be trapped. To detect pattern-based clusters, an extension of this method was proposed in which a random walker alternatively moves forward following the directionality of edges, and backwards, in the opposite direction [6]. This method successfully extracts clusters in citation-based networks. Similarly, another random walk-based approach extends the definition of directed modularity to extract clusters of densely connected nodes with a cyclic pattern of connections between clusters [18, 19]. The blockmodelling approach is based on the extraction of functional classes from networks [7]. Each class corresponds to a node in an image graph, which describes the functional roles of classes of nodes and the overall pattern of connections between classes of nodes. A measure of how well a given directed graph fits to an image graph is proposed. The optimal partitioning of nodes and image graph are obtained by maximizing this quality measure using alternating optimization combined with simulated annealing. Methods based on stochastic blockmodels were first defined for undirected networks and then exten-ded to directed graphs in [20]. A stochastic blockmodel is a model of random graph. For a number $$k$$ of blocks the parameters of a stochastic blockmodel are a vector of probabilities $$\gamma\in\{0,1\}^k$$ and a matrix $$\Phi\in \{0,1\}^{k\times k}$$. Each node is randomly assigned to a block with probabilities specified by $$\gamma$$ and the probability of having an edge $$(i,j)$$ for $$i$$ in block $$s$$ and $$j$$ in block $$t$$ is $$\Phi_{st}$$. For this reason, nodes within a block are said to be stochastically equivalent. One of the various methods for the detection of blocks in graphs generated by a stochastic blockmodel is based on the extraction of singular vectors of the adjacency matrix [8], which is similar to the bibliometric symmetrization combined with the classical spectral clustering algorithm. The common definition of the stochastic blockmodel implies that in- and out-degrees of nodes within blocks follow a Poisson binomial distribution [10, 21] and have thus the same expected value. As this assumption is not verified in most real-world directed networks, [12] proposed a degree-corrected stochastic blockmodel for directed graphs where additional variables are introduced allowing more flexibility in the distribution of degrees of nodes within blocks. The partitioning of nodes is obtained by an expectation maximization process. Other statistical methods exist among which the so-called clustering algorithm for content-based networks [11]. This method is similar to stochastic blockmodelling but instead of block-to-block probabilities of connection, it is based on node-to-block and block-to-node probabilities. The model parameters are adjusted through an expectation maximization algorithm. This approach can be viewed as another degree-corrected stochastic blockmodel and hence it is robust to high variations in the degrees of nodes but it also involves a more complex optimization approach due to the higher number of parameters. Finally, some methods are based on the detection of roles in directed networks such as in [22] which defines the number of paths of given lengths starting or ending in a node as its features from which node similarities are extracted. As we will see, our definition of block-cyclic and block-acyclic graph does not include any constraint on the regularity of node features such as the number of incoming or outgoing paths. We are interested in the detection of clusters in block-cyclic and block-acyclic graphs. Apart from the role model, the methods described above are all theoretically able to extract such clusters. Methods based on bibliometric symmetrization and stochastic blockmodels are able to detect such structures whenever the assumption of stochastic equivalence between nodes within blocks is verified. Provided that the graph is strongly connected, the method based on two-step random walk can also be used. If degrees of nodes are large enough, the blockmodelling approach is also successful. However, the benchmark tests presented in Section 6 show that our algorithms outperform all these methods in the presence of high perturbations or when these assumptions are not fulfilled. 3. Spectral clustering algorithm for block-cycles In this section, we describe a method for the extraction of blocks of nodes in block-cyclic graphs (or block-cycles). We recall that a block-cycle is a directed graph where nodes can be partitioned into non-empty blocks with a cyclic pattern of connections between blocks. We provide a formal definition of block-cycle below. Definition 3.1 (Block-cycle) A directed graph $$G=(V,E,W)$$ is a block-cycle of $$k$$ blocks if it contains at least one directed cycle of length $$k$$ and if there exists a function $$\tau:V\rightarrow\{1,\ldots,k\}$$ partitioning the nodes of $$V$$ into $$k$$ non-empty subsets, such that \begin{equation} E\subseteq\{(u,v)\text{ : }(\tau(u),\tau(v))\in \mathscr{C}\} \end{equation} (3.1) where $$\mathscr{C}=\{(1,2),(2,3),\ldots,(k-1,k),(k,1)\}$$. Due to the equivalence between the existence of clusters in a graph and the block structure of the adjacency matrix, we use the terms ‘cluster’ and ‘block’ interchangeably. We also use the terms ‘block-cycle’ and ‘block-cyclic graph’ interchangeably. Figure 1b displays an example of block-cycle. The blocks may contain any number of nodes (other than zero) and there can be any number of edges connecting a pair of consecutive blocks in the cycle. The definition implies that any block-cycle is k-partite [23]. However, the converse is not true as the definition of block-cycle includes an additional constraint on the directionality of edges. It is worth mentioning that, in the general case, a given block-cycle is unlikely to derive from a stochastic blockmodel; in which nodes within a block are stochastically equivalent [8]. Indeed, as mentioned before, stochastic equivalence implies that degrees of nodes within the same block are identically distributed. Our definition does not include such regularity assumption. In the remainder of this section, we first analyse spectral properties of block-cycles. Then we formulate our spectral clustering algorithm for the detection of blocks in block-cycles. 3.1 Spectral properties of block-cycles Up to a permutation of blocks, the adjacency matrix of a block-cycle is a block circulant matrix with nonzero blocks in the upper diagonal and in the bottom-left corner as depicted in Fig. 2a. Given a perturbed block-cycle, our goal is to recover the partitioning of nodes into blocks, namely to provide an estimation $$\hat{\tau}$$ of $$\tau$$. We assume that a perturbed block-cycle is obtained by randomly appending edges to a block-cycle. Different distributions of the perturbing edges are considered: in particular, when the perturbing edges are uniformly distributed across the graph, a sufficient condition is provided, giving the maximum allowed number of perturbing edges that guarantees the success of our method. Experiments described in Section 6 also demonstrate the effectiveness of our method when the perturbation is not uniform (i.e. when the perturbing edges are preferentially targeting some specific nodes). Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. To detect blocks in a block-cycle, we use the spectrum of a graph-related matrix, the transition matrix $$P\in\mathbb{R}^{n\times n}$$ associated to the Markov chain based on the graph. \begin{equation} P_{ij}=\left\lbrace\begin{array}{ll} \frac{W_{ij}}{d_i^{out}}&\text{ if }d_i^{out}\neq 0\\ 0&\text{ otherwise} \end{array}\right. \end{equation} (3.2) where $$W$$ is the weight matrix and $$d_i^{out}=\sum_jW_{ij}$$ is the out-degree of node $$i$$. The transition matrix is row stochastic, namely $$P\mathbf{1}=\mathbf{1}$$ where $$\mathbf{1}$$ represent the vector of ones. The basic spectral property of the transition matrix is that all its complex eigenvalues lie in the ball $$\{x\in\mathbb{C}\text{ : }\Vert x\Vert_2\leq 1\}$$ regardless of the associated graph [24]. This property combined with the fact that the transition matrix of a block-cycle is block circulant [25] makes it possible to prove the following lemma2. We make the assumption that $$d_i^{out}>0$$ for any node $$i$$. Lemma 3.1 Let $$G=(V,E,W)$$ be a block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$. Then $$\lambda_l=e^{-2\pi i\frac{l}{k}}\in spec(P)$$ for all $$0\leq l\leq k-1$$, namely there are $$k$$ eigenvalues located on a circle centered at the origin and with radius $$1$$ in the complex plane. The right eigenvector associated to the eigenvalue $$e^{-2\pi i\frac{l}{k}}$$ is \begin{equation} u^l_j=\left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}e^{2\pi i\frac{lk}{k}} & j\in V_1\\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l(k-1)}{k}} & j\in V_2\\ \vdots & \\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l}{k}} & j\in V_k \end{array}.\right. \end{equation} (3.3) for which $$Pu^l=\lambda_lu^l$$ and $$\Vert u_l\Vert_2=1$$. Moreover, if $$G$$ is strongly connected, then the eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$ have multiplicity $$1$$ and all other eigenvalues of $$P$$ have a modulus strictly lower than $$1$$. We refer to these eigenvalues and eigenvectors as the cycle eigenvalues and the right cycle eigenvectors. This spectral property is illustrated in Fig. 2b displaying the eigenvalues of the transition matrix of a block-cycle of eight blocks. Hence, computing the cycle eigenvalues and storing the corresponding right eigenvectors of $$P$$ as the columns of a matrix $$U\in\mathbb{C}^{n\times k}$$, the rows $$\{x^j\text{ : }1\leq j\leq n\}$$ define vector representations of the nodes of $$G$$. To recover the $$k$$ blocks of the block-cycle, we observe that the embeddings $$\{x^j\text{ : }j\in V^s\}$$ of the nodes in block $$V^s$$ are identical, being all equal to \begin{equation} c^s=\frac{1}{\sqrt{n}}\left[1,e^{\frac{2\pi i}{k}(k-s+1)},e^{\frac{4\pi i}{k}(k-s+1)},\ldots,e^{\frac{2\pi i}{k}(k-1)(k-s+1)}\right]\!. \end{equation} (3.4) We refer to the set of vectors $$\{c^s\text{ : }0\leq s\leq k-1\}$$ as block centroids of the block-cycle. In the presence of perturbing edges, the vectors $$\{x^j\text{ : } j\in V^s\}$$ are no longer identical but they form a cluster around the block centroid $$c^s$$. Hence, a way to recover the blocks of a perturbed block-cycle is to cluster the vectors $$\{x^j\text{ : }1\leq j\leq n\}$$ by assigning each node $$j$$ to the nearest block centroid, namely \begin{equation} \hat{\tau}(j)\leftarrow \underset{s}{\text{argmin}}\Vert x^j-c^s\Vert_2 \end{equation} (3.5) where $$\hat{\tau}(j)$$ is the estimated block assignment of node $$j$$. Theorem 3.2 justifies the approach by quantifying the effect of additive perturbations on the spectrum of the transition matrix of a block-cycle: starting with an unperturbed block-cycle $$G$$, we consider a graph $$\tilde{G}$$ obtained by appending perturbing edges to $$G$$ and we provide upper bounds on the perturbation of cycle eigenvalues, right cycle eigenvectors and node embeddings $$\{x^j\text{ : }1\leq j\leq n\}$$. These bounds are further used to provide a sufficient condition on the perturbation (Equation 3.12, further) for the method to succeed in recovering the blocks of nodes3. Theorem 3.2 Let $$G=(V,E,W)$$ be a strongly connected block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$, let $$\lambda_0,\ldots,\lambda_{k-1}$$ be the $$k$$ cycle eigenvalues and $$u^0,\ldots,u^{k-1}$$ be the corresponding right cycle eigenvectors. Let the $$\hat{G}=(V,\hat{E},\hat{W})$$ be a perturbed version of $$G$$ formed by appending positively weighted edges to $$G$$ except self-loops. Let $$P$$ and $$\hat{P}$$ denote the transition matrices of $$G$$ and $$\hat{G}$$ respectively. We define the quantities \begin{equation} \begin{array}{rcl} \sigma &=&\underset{(i,j)\in\hat{E}}{\max}\text{ }\frac{\hat{d}_j^{in}}{\hat{d}_i^{out}}\\ \rho &=&\underset{i}{\max}\text{ }\frac{\hat{d}_i^{out}-d_i^{out}}{d_i^{out}} \end{array} \end{equation} (3.6) where $$d^{in}_i$$, $$d^{out}_i$$, $$\hat{d}^{in}_i$$ and $$\hat{d}^{out}_i$$ represent the in-degree and out-degree of $$i$$-th node in $$G$$ and $$\hat{G}$$, respectively. Then, 1. for any cycle eigenvalue $$\lambda_l\in spec(P)$$, there exists an eigenvalue $$\hat{\lambda}_l\in spec(\hat{P})$$ so that \begin{equation} \left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right) \end{equation} (3.7) where $$f$$ is the (left) Perron eigenvector of $$P$$, namely the vector $$f$$ verifying $$f^TP=f^T$$ with $$f^T\mathbf{1}=1$$, 2. there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ (i.e. for which $$\hat{P}\hat{u}^l=\hat{\lambda}^l\hat{u}^l$$) verifying \begin{equation} \Vert\hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right) \end{equation} (3.8) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$, 3. the node embeddings $$\{x^1,\ldots,x^n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}^1,\ldots,\hat{x}^n\}\subset\mathbb{C}^k$$ defined as the rows of the right eigenvector matrices $$U=[u^1,\ldots,u^k]\in\mathbb{C}^{n\times k}$$ and $$\hat{U}=[\hat{u}^1,\ldots,\hat{u}^k]\in\mathbb{C}^{n\times k}$$, respectively, verify \begin{equation} \Vert x^j - \hat{x}^j\Vert_2 \leq k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right) \end{equation} (3.9) for each $$j\in\{1,\ldots,n\}$$. In addition to Theorem 3.2, Lemma 3.2 shows that the block centroids of a block-cycle are equidistant4, with a pairwise Euclidean distance equal to $$\sqrt{2k/n}$$. Lemma 3.2 The $$k$$ distinct row vectors $$c^0,\ldots,c^{k-1}$$ that constitute the set of the rows of the eigenvector matrix $$U$$ in the case of an unperturbed block-cycle (defined by Equation 3.4) are pairwise equidistant and, for all $$r\neq s\in\{0,\ldots,k-1\}$$, \begin{equation} \Vert c^r-c^s\Vert_2=\sqrt{\frac{2k}{n}} \end{equation} (3.10) Lemma 3.2 combined with the third claim of Theorem 3.2 provides a sufficient condition for the success of the method expressed by Equation 3.5 that assigns each node to the block with nearest associated centroid. Indeed, to ensure that each node vector $$\hat{x}^j$$ is located closer to its block centroid $$c^{\tau(j)}$$ than to other block centroids, and if we neglect higher order terms in Equation 3.9, a sufficient condition is to ensure that the distance $$\Vert x^j-\hat{x}^j\Vert_2$$ does not exceed half of the pairwise distance $$\sqrt{2k/n}$$ between the centroids, namely \begin{equation} k\sqrt{2\sigma\rho} \underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2\leq \frac{1}{2}\sqrt{\frac{2k}{n}} \end{equation} (3.11) or equivalently \begin{equation} \rho\leq \frac{1}{4kn\sigma}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}. \end{equation} (3.12) If the relative perturbation $$\rho$$ on the out-degrees does not exceed the bound expressed by the right-hand side of the inequality, the success of the clustering method expressed by Equation 3.5 is guaranteed. We make a few comments about the different quantities involved in the perturbation bounds of Theorem 3.2 and the sufficient condition expressed by Equation 3.12. The quantities defined as $$\sigma$$ and $$\rho$$ both depend on the presence of perturbing edges. The $$\sigma$$ quantity measures the discrepancy between in-degree of destination and out-degree of origin for all edges in the perturbed graph and hence it is greater than $$1$$. It is close to $$1$$ in the particular case of a perturbed block-cycle with a uniform perturbation5 and if the block-cycle has homogeneous degrees6. The quantity defined as $$\rho$$ measures the relative discrepancy between out-degrees in the presence and in the absence of perturbation. In the particular case of a block-cycle with homogeneous out-degrees approximately equal to $$d^{out}$$, and a uniform perturbation of $$\alpha |E|$$ edges in total (for some $$\alpha>0$$), corresponding to $$\alpha d^{out}$$ perturbing out-edges for each node, we have $$\rho\simeq \alpha$$. In that particular case, Equation 3.12 states that the method is successful for a relative perturbation magnitude $$\alpha$$ satisfying \begin{equation} \frac{|\tilde{E}|}{|E|}=\alpha\leq \frac{1}{4kn}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}. \end{equation} (3.13) where $$\tilde{E}$$ represents the perturbing edges and $$E$$ represents the edges in the original block-cycle. In the first claim of Theorem 3.2 regarding the perturbation of the cycle eigenvalues, $$f$$ is the Perron eigenvector [26] with unit $$1$$-norm (i.e. the real valued vector with nonnegative entries and verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$). Thus $$\frac{1}{\sqrt{n}}\leq \Vert f\Vert_2\leq 1$$ and $$\Vert f\Vert_2=\frac{1}{\sqrt{n}}$$ when it is constant, namely when the stationary probability of the random walk associated to $$P$$ is uniform over the vertices. This is the case for instance for a block-cycle generated by a stochastic blockmodel with similar probabilities of transitions between any pair of consecutive blocks in the block-cycle. Regarding the perturbation of right eigenvectors and node embeddings (second and third claims of Theorem 3.2), Inequality 3.8 follows from the fact that the cycle eigenvalues are simple. Providing bounds on the norm of the Drazin generalized inverse of a non-symmetric matrix is tedious since it depends on the condition number of each eigenvalue of the matrix (see for instance [27] for bounds on the norm of $$(I-P)^{\#}$$ for a stochastic matrix $$P$$). However, based on [28], we show in Appendix A.2 that, when $$P$$ is diagonalizable, the norm of the Drazin generalized inverse of $$(\lambda_lI-P)$$ verifies \begin{equation} \Vert (\lambda_lI-P)^{\#}\Vert_2\leq (n-1)\left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1} \Vert Y\Vert_2 \end{equation} (3.14) where $$\{u^r\text{ : }1\leq r\leq n\}$$ and $$\{y^r\text{ : }1\leq r\leq n\}$$ represent the right and left eigenvectors7 of $$P$$ such that $$\Vert u^1\Vert_2=...=\Vert u^n\Vert_2=1$$ and $$(y^1)^Hu^1=...=(y^n)^Hu^n=1$$, and $$Y=[y^1,...,y^{l-1},y^{l+1},...,y^n]$$ (concatenation of left eigenvectors). For the particular case of a block-cycle, whenever the perturbation is low8 and the number of blocks is greater than $$7$$, we show in Appendix A.2 that the eigenvalue closest to a cycle eigenvalue is the closest other cycle eigenvalue, hence \begin{equation} \underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|=2\sin\left(\frac{\pi}{k}\right)\!. \end{equation} (3.15) Moreover the empirical observations described in Appendix A.2 show that \begin{equation} \Vert Y\Vert_2\lessapprox (n-1)\sqrt{n}\Vert f\Vert_2. \end{equation} (3.16) Combining the results of Equations 3.14, 3.15 and 3.16, we conclude the following. The perturbation bounds expressed by Equations 3.7, 3.8 and 3.9 in Theorem 3.2 are small for low values of $$k$$ and for low values of $$\sigma$$, $$\rho$$ and $$\Vert f\Vert_2$$. For instance, this is typically the case when the perturbing edges are uniformly distributed across the graph and when the block-cycles consist of balanced blocks with similar numbers of edges among the consecutive blocks of the block-cycle. Similarly, the sufficient condition bounding the maximum allowed perturbation on the out-degrees of the nodes (inequality 3.12) is looser when $$\sigma$$ and $$\Vert f\Vert_2$$ are low. For a uniformly perturbed block-cycle with balanced blocks and connections, the sufficient condition becomes inequality 3.13 which restricts the allowed number of perturbing edges. In practice, experiments on synthetic graphs described in Section 6, show that the method is also able to identify the blocks when the perturbation is structured and not uniform (e.g. generated by a stochastic block-model with non-uniform probabilities of connection) and when the degrees of the nodes of the block-cycle are not homogeneous. In general, Equation 3.8 implies that the right cycle eigenvectors of a block-cycle vary continuously as functions of the entries of the transition matrix and hence of the edges’ weights. Although the bounds provided by Theorem 3.2 can be quite loose in practice, the continuity of the cycle eigenvalues and the right cycle eigenvectors is verified for any strongly connected block-cycle. This continuity property provides the theoretical foundation of our spectral clustering algorith