# Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs

Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic... Abstract We propose two spectral algorithms for partitioning nodes in directed graphs respectively with a cyclic and an acyclic pattern of connection between groups of nodes, referred to as blocks. Our methods are based on the computation of extremal eigenvalues of the transition matrix associated to the directed graph. The two algorithms outperform state-of-the-art methods for the detection of node clusters in synthetic block-cyclic or block-acyclic graphs, including methods based on blockmodels, bibliometric symmetrization and random walks. In particular, we demonstrate the ability of our algorithms to focus on the cyclic or the acyclic patterns of connection in directed graphs, even in the presence of edges that perturb these patterns. Our algorithms have the same space complexity as classical spectral clustering algorithms for undirected graphs and their time complexity is also linear in the number of edges in the graph. One of our methods is applied to a trophic network based on predator–prey relationships. It successfully extracts common categories of preys and predators encountered in food chains. The same method is also applied to highlight the hierarchical structure of a worldwide network of autonomous systems depicting business agreements between Internet Service Providers. 1. Introduction The past years have witnessed the emergence of large networks in various disciplines including social science, biology and neuroscience. These networks model pairwise relationships between entities such as predator–prey relationships in trophic networks, friendship in social networks, etc. These structures are usually represented as graphs where pairwise relationships are encoded as edges connecting vertices in the graph. When the relationships between entities are not bidirectional, the resulting graph is directed. Some directed networks in real-world applications have a block-acyclic structure: nodes can be partitioned into groups of nodes such that the connections between groups form an acyclic pattern as depicted in Fig. 1a. Such patterns are encountered in networks that tend to have a hierarchical structure such as trophic networks modelling predator–prey relationships [1] or networks of autonomous systems where edges denote money transfers between Internet Service Providers [2]. On the other hand, one may encounter directed graphs with a block-cyclic structure (Fig. 1b) when the network models a cyclic phenomenon such as the carbon cycle [3]. These two patterns are intimately related as the removal of a few edges from a block-cyclic graph makes it block-acyclic. This relationship is also observed in real-world networks: a graph of predator-prey interactions can be viewed as an acyclic version of the carbon cycle. In this article, we take advantage of this connection between the two types of patterns and formulate two closely related algorithms for the detection of groups of nodes respectively in block-acyclic and block-cyclic graphs in the presence of slight perturbations in the form of edges that do not follow the block-cyclic or the block-acyclic pattern of connections in the graph. Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). The partitioning of nodes in block-acyclic and block-cyclic networks can be viewed as a clustering problem. In graph mining, clustering refers to the task of grouping nodes that are similar in some sense. The resulting groups are called clusters. In the case of directed graphs, the definition of similarity between two nodes may take the directionality of edges incident to these nodes into account. Clustering algorithms taking the directionality of edges into account may be referred to as pattern-based clustering algorithms which extract pattern-based clusters [4]: such methods produce a result in which nodes within the same cluster have similar connections with other clusters. Groups of nodes in block-acyclic and block-cyclic graphs are examples of pattern-based clusters. Several approaches were proposed for the detection of pattern-based clusters in directed graphs [4]. Popular families of methods for the detection of pattern-based clusters are random walk based algorithms, blockmodels and more specifically stochastic blockmodels and bibliometric symmetrization. Random walk based models are usually meant to detect density-based clusters [5], however by defining a two step random walk as suggested in [6] pattern-based clusters such as blocks in block-cyclic graphs can also be detected. But, the success of this method is guaranteed only when the graph is strongly connected and the result is hazardous when the graph is sparse, with a high number of nodes with zero inner or outer degree. Models based on a blockmodelling approach [7] are based on the definition of an image graph representing connections between blocks of nodes in a graph and the block membership is selected so that the corresponding image graph is consistent with the edges of the original graph. However, in existing algorithms the optimization process relies, for instance, on simulated annealing, hence the computational cost is high and there is a risk of falling into a local optimum. Moreover, this method may also fail when the graph is sparse. Clustering algorithms based on stochastic blockmodels detect clusters of nodes that are stochastically equivalent. In particular the method proposed in [8] estimates the block membership of nodes by defining a vertex embedding based on the extraction of singular vectors of the adjacency matrix which turns to be efficient compared to the common methods based on expectation maximization. However, the assumption of stochastic equivalence implies that the degrees of nodes within clusters exhibit a certain regularity as shown further. Hence, this approach may yield poor results in detecting clusters in real-world block-cyclic and block-acyclic networks. A related category of method is bibliometric symmetrization, which defines a node similarity matrix as a weighted sum between the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$ [9] where $$W$$ is the adjacency matrix of the graph. However, it may also fail when the degrees of nodes are not sufficiently regular within groups. To relax this assumption, degree corrected versions with variables representing the degrees of nodes were proposed [10, 11]. But fitting these models relies on costly methods that do not eliminate the risk of falling into a local optimum (simulated annealing, local heuristics, etc.) [12]. Hence methods based on random walks, bibliometric symmetrization, blockmodels with or without degree correction, may yield poor results in the detection of blocks of nodes in block-cyclic and block-acyclic graphs due to assumptions of connectivity or regularity or due to the computational difficulty of solving the associated optimization problems. The methods described in this article partly alleviate these weaknesses. In this article, we present two new clustering algorithms that extract clusters in block-cyclic and block-acyclic graphs in the presence of perturbing edges. The first algorithm, called Block-Cyclic Spectral (BCS) clustering algorithm is designed for the detection of clusters in block-cyclic graphs. The second algorithm, referred to as Block-Acyclic Spectral (BAS) clustering algorithm is a slight extension of the first one that is able to detect clusters in block-acyclic graphs. Two types of perturbation are considered in our study: in the first case, the perturbing edges are uniformly distributed across the graph, while, in the second case, the perturbation is not uniform with some groups of nodes experiencing a higher volume of perturbing edges. A theoretical analysis of the effect of perturbing edges provides a sufficient condition on the perturbation for the success of our algorithms: in particular, when the perturbation is uniform, the condition specifies a maximum number of perturbing edges that can be tolerated by our BCS clustering algorithm to recover the blocks of a block-cyclic graph. Moreover, experimental results show that, even when the perturbation is not uniform, our algorithms are also successful and outperform other approaches. In particular, experiments on synthetic unweighted graphs show that our BCS clustering algorithm is able to recover the blocks perfectly even when appending $$\mathscr{O}(|E|)$$ perturbing edges to a block-cyclic graph containing $$|E|$$ edges.1 We apply the second algorithm to two real-world datasets: a trophic network in which the traditional classification of agents in an ecosystem is detected, from producers to top-level predators, and a worldwide network of autonomous systems depicting money transfers between Internet Service Providers. When tested on synthetic datasets, our algorithms produce smaller clustering errors than other state-of-the-art algorithms. Moreover, our methods only involve standard tools of linear algebra which makes them efficient in terms of time and space complexity. Simple heuristics are also provided in order to automatically determine the number of blocks in block-cyclic and block-acyclic graphs. Hence the approach, we follow differs from other clustering methods for directed graphs: we restrict ourselves to two patterns of connection (cyclic and acyclic) but we make no assumption of regularity (for instance on the degrees of nodes). Moreover, our BCS and BAS clustering algorithms are able to focus on the cyclic or the acyclic patterns of connection respectively in block-cyclic or block-acyclic graphs, while neglecting any other patter present in the graph. Our proposed algorithms are based on the computation of extremal eigenvalues and eigenvectors of a non-symmetric graph-related matrix, commonly called the transition matrix$$P$$. The proposed approaches amount to finding a set of right eigenvectors of $$P$$ verifying   $$Pu=\lambda u\text{ for }\lambda\in\mathbb{C}\text{, }|\lambda|\simeq 1\text{, }\lambda\neq 1$$ (1.1) and clustering the entries of these eigenvectors to recover the blocks of nodes. Hence, the process is similar to the well-known Spectral Clustering algorithm for the detection of clusters in undirected graphs, which is also based on the computation of extremal eigenvalues and eigenvectors of a graph-related matrix [13]. However, spectral clustering and extensions of spectral clustering to directed graphs are essentially based on the real spectrum of symmetric matrices associated to the graph [8, 14, 15]. In contrast, our method is based on the complex spectrum of a non-symmetric matrix. Hence, it keeps the intrinsically asymmetric information contained in directed graphs while having approximately the same time and space complexity as other spectral clustering algorithms. A paper recently appeared [16] that exploits spectral information (in a different way than in the present paper) for solving a related problem, the detection of block-cyclic components in the communities of a graph, with a special focus on the three-block case. In contrast, we focus on networks with a global block-cyclic structure and extend our method for the detection of acyclic structures, which we deem even more relevant than block-cyclicity in practical situations. Part of the results presented here were derived in an unpublished technical report [17]. This article offers more empirical validation and comparison with state-of-the-art competing techniques. The structure of this article is as follows. In Section 2, we describe related clustering methods for directed graphs. In Section 3, we present our BCS clustering algorithm for the detection of clusters in block-cyclic graphs. Then we describe the links between block-cyclic and block-acyclic graphs in Section 4. In Section 5, BAS clustering algorithm is introduced for the detection of clusters in block-acyclic graphs. In Section 6, we analyse the performances of BCS and BAS clustering algorithms on synthetic data. Finally, in Section 7, we apply BAS clustering algorithm to a real-world trophic network and a network of autonomous systems. 2. Related work In this section, we present existing algorithms related to our work, including the classical spectral clustering algorithm and some existing algorithms for clustering directed graphs. In subsequent sections, we refer to a weighted directed graph as a triplet $$G=(V,E,W)$$ where $$V$$ is a set of nodes, $$E\subseteq V\times V$$ is a set of directed edges and $$W\in\mathbb{R}^{n\times n}_+$$ is a matrix of positive edge weights such that $$W_{uv}$$ is the weight of edge $$(u,v)$$ and for each $$u,v\in V$$,   $$W_{uv}>0\leftrightarrow (u,v)\in E.$$ (2.1) When the graph is unweighted, we refer to it as a pair $$G=(V,E)$$ and the binary adjacency matrix is kept implicit. Moreover, when the graph is undirected, we have $$W=W^T$$. Finally, we refer to a right (resp. left) eigenvector of a matrix $$A\in\mathbb{C}^{n\times n}$$ associated to an eigenvalue $$\lambda\in\mathbb{C}$$ as a vector $$u\in\mathbb{C}^n\setminus\{0\}$$ such that $$Au=\lambda u$$ (resp. $$u^HA=\lambda u^H$$). 2.1 Spectral clustering of undirected graphs Spectral clustering uses eigenvalues and eigenvectors of a graph-related matrix (the Laplacian) to detect density-based clusters of nodes in an undirected graph, namely clusters with a high number of intra-cluster edges and a low number of inter-cluster edges [13]. The method can be decomposed into two steps. First, the nodes of the graph are mapped to points in a Euclidean space such that nodes that are likely to lie in the same cluster are mapped to points that are close to each other in this projection space. The second step of the method involves clustering the $$n$$ points in $$\mathbb{R}^k$$ using k-means algorithm. The algorithm is based on spectral properties of the graph Laplacian $$L\in\mathbb{R}^{n\times n}$$ defined by   $$L_{uv}=\left\lbrace\begin{array}{ll} 1-\frac{W_{uu}}{d_u}&\text{ if }u=v\text{ and }d_u\neq 0\\ -\frac{W_{uv}}{\sqrt{d_ud_v}}&\text{ if }u\text{ and }v\text{ are adjacent}\\ 0&\text{ otherwise} \end{array}\right.$$ (2.2) where $$W$$ is the adjacency matrix of the graph and $$d_u$$ is the degree of node $$u$$. If the target number of clusters is $$k$$, extracting the right eigenvectors associated to the $$k$$ smallest eigenvalues of the Laplacian and storing them as the columns of a matrix $$U\in\mathbb{R}^{n\times k}$$, the embeddings of nodes are given by the $$n$$ rows of $$U$$. One can justify this method in the following way. When clusters are disconnected, namely when the graph contains $$k$$ connected components, the rows of $$U$$ associated to nodes belonging to the same component are identical [13]. Hence, when perturbing this disconnected graph, namely in the presence of clusters with a low number of inter-cluster edges, nodes within the same clusters are mapped to points that tend to form clusters in $$\mathbb{R}^k$$. This is explained by the semi-simplicity of eigenvalue $$0$$ of the graph Laplacian which implies the continuous dependence of associated right eigenvectors on the weights of edges in the graph [13]. In Sections 3 and 5, we show that a similar approach can be used to extract clusters in directed graphs with a cyclic or an acyclic pattern of connection between clusters: we also use the spectrum of a graph-related matrix to map nodes to points in a Euclidean space and cluster these points with k-means algorithm. 2.2 Clustering algorithms for directed graphs In this section, we describe existing clustering algorithms for the detection of pattern-based clusters in directed graphs, namely groups of nodes with similar connections to other groups in some sense. We focus on methods that are theoretically able to extract blocks from block-cyclic and block-acyclic graphs. Bibliometric symmetrization refers to a symmetrization of the adjacency matrix $$W$$ of $$G$$. The symmetrized matrix $$(1-\alpha)W^TW+\alpha WW^T$$ is defined as the adjacency matrix of an undirected graph $$G_u$$ for a certain choice of weighing parameter $$\alpha$$. This symmetric adjacency matrix is a linear combination of the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$. Then clustering methods for undirected graphs are applied to $$G_u$$ such as a spectral clustering algorithm. This method is efficient to detect co-citation networks [9]. The primary focus of random walk based clustering algorithms is the detection of density-based clusters [5], namely with a high number of intra-cluster edges and a low-number of inter-cluster edges. A symmetric Laplacian matrix for directed graphs based on the stationary probability distribution of a random walk is defined and applying classical spectral clustering algorithm to this Laplacian matrix leads to the extraction of clusters in which a random walker is likely to be trapped. To detect pattern-based clusters, an extension of this method was proposed in which a random walker alternatively moves forward following the directionality of edges, and backwards, in the opposite direction [6]. This method successfully extracts clusters in citation-based networks. Similarly, another random walk-based approach extends the definition of directed modularity to extract clusters of densely connected nodes with a cyclic pattern of connections between clusters [18, 19]. The blockmodelling approach is based on the extraction of functional classes from networks [7]. Each class corresponds to a node in an image graph, which describes the functional roles of classes of nodes and the overall pattern of connections between classes of nodes. A measure of how well a given directed graph fits to an image graph is proposed. The optimal partitioning of nodes and image graph are obtained by maximizing this quality measure using alternating optimization combined with simulated annealing. Methods based on stochastic blockmodels were first defined for undirected networks and then exten-ded to directed graphs in [20]. A stochastic blockmodel is a model of random graph. For a number $$k$$ of blocks the parameters of a stochastic blockmodel are a vector of probabilities $$\gamma\in\{0,1\}^k$$ and a matrix $$\Phi\in \{0,1\}^{k\times k}$$. Each node is randomly assigned to a block with probabilities specified by $$\gamma$$ and the probability of having an edge $$(i,j)$$ for $$i$$ in block $$s$$ and $$j$$ in block $$t$$ is $$\Phi_{st}$$. For this reason, nodes within a block are said to be stochastically equivalent. One of the various methods for the detection of blocks in graphs generated by a stochastic blockmodel is based on the extraction of singular vectors of the adjacency matrix [8], which is similar to the bibliometric symmetrization combined with the classical spectral clustering algorithm. The common definition of the stochastic blockmodel implies that in- and out-degrees of nodes within blocks follow a Poisson binomial distribution [10, 21] and have thus the same expected value. As this assumption is not verified in most real-world directed networks, [12] proposed a degree-corrected stochastic blockmodel for directed graphs where additional variables are introduced allowing more flexibility in the distribution of degrees of nodes within blocks. The partitioning of nodes is obtained by an expectation maximization process. Other statistical methods exist among which the so-called clustering algorithm for content-based networks [11]. This method is similar to stochastic blockmodelling but instead of block-to-block probabilities of connection, it is based on node-to-block and block-to-node probabilities. The model parameters are adjusted through an expectation maximization algorithm. This approach can be viewed as another degree-corrected stochastic blockmodel and hence it is robust to high variations in the degrees of nodes but it also involves a more complex optimization approach due to the higher number of parameters. Finally, some methods are based on the detection of roles in directed networks such as in [22] which defines the number of paths of given lengths starting or ending in a node as its features from which node similarities are extracted. As we will see, our definition of block-cyclic and block-acyclic graph does not include any constraint on the regularity of node features such as the number of incoming or outgoing paths. We are interested in the detection of clusters in block-cyclic and block-acyclic graphs. Apart from the role model, the methods described above are all theoretically able to extract such clusters. Methods based on bibliometric symmetrization and stochastic blockmodels are able to detect such structures whenever the assumption of stochastic equivalence between nodes within blocks is verified. Provided that the graph is strongly connected, the method based on two-step random walk can also be used. If degrees of nodes are large enough, the blockmodelling approach is also successful. However, the benchmark tests presented in Section 6 show that our algorithms outperform all these methods in the presence of high perturbations or when these assumptions are not fulfilled. 3. Spectral clustering algorithm for block-cycles In this section, we describe a method for the extraction of blocks of nodes in block-cyclic graphs (or block-cycles). We recall that a block-cycle is a directed graph where nodes can be partitioned into non-empty blocks with a cyclic pattern of connections between blocks. We provide a formal definition of block-cycle below. Definition 3.1 (Block-cycle) A directed graph $$G=(V,E,W)$$ is a block-cycle of $$k$$ blocks if it contains at least one directed cycle of length $$k$$ and if there exists a function $$\tau:V\rightarrow\{1,\ldots,k\}$$ partitioning the nodes of $$V$$ into $$k$$ non-empty subsets, such that   $$E\subseteq\{(u,v)\text{ : }(\tau(u),\tau(v))\in \mathscr{C}\}$$ (3.1) where $$\mathscr{C}=\{(1,2),(2,3),\ldots,(k-1,k),(k,1)\}$$. Due to the equivalence between the existence of clusters in a graph and the block structure of the adjacency matrix, we use the terms ‘cluster’ and ‘block’ interchangeably. We also use the terms ‘block-cycle’ and ‘block-cyclic graph’ interchangeably. Figure 1b displays an example of block-cycle. The blocks may contain any number of nodes (other than zero) and there can be any number of edges connecting a pair of consecutive blocks in the cycle. The definition implies that any block-cycle is k-partite [23]. However, the converse is not true as the definition of block-cycle includes an additional constraint on the directionality of edges. It is worth mentioning that, in the general case, a given block-cycle is unlikely to derive from a stochastic blockmodel; in which nodes within a block are stochastically equivalent [8]. Indeed, as mentioned before, stochastic equivalence implies that degrees of nodes within the same block are identically distributed. Our definition does not include such regularity assumption. In the remainder of this section, we first analyse spectral properties of block-cycles. Then we formulate our spectral clustering algorithm for the detection of blocks in block-cycles. 3.1 Spectral properties of block-cycles Up to a permutation of blocks, the adjacency matrix of a block-cycle is a block circulant matrix with nonzero blocks in the upper diagonal and in the bottom-left corner as depicted in Fig. 2a. Given a perturbed block-cycle, our goal is to recover the partitioning of nodes into blocks, namely to provide an estimation $$\hat{\tau}$$ of $$\tau$$. We assume that a perturbed block-cycle is obtained by randomly appending edges to a block-cycle. Different distributions of the perturbing edges are considered: in particular, when the perturbing edges are uniformly distributed across the graph, a sufficient condition is provided, giving the maximum allowed number of perturbing edges that guarantees the success of our method. Experiments described in Section 6 also demonstrate the effectiveness of our method when the perturbation is not uniform (i.e. when the perturbing edges are preferentially targeting some specific nodes). Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. To detect blocks in a block-cycle, we use the spectrum of a graph-related matrix, the transition matrix $$P\in\mathbb{R}^{n\times n}$$ associated to the Markov chain based on the graph.   $$P_{ij}=\left\lbrace\begin{array}{ll} \frac{W_{ij}}{d_i^{out}}&\text{ if }d_i^{out}\neq 0\\ 0&\text{ otherwise} \end{array}\right.$$ (3.2) where $$W$$ is the weight matrix and $$d_i^{out}=\sum_jW_{ij}$$ is the out-degree of node $$i$$. The transition matrix is row stochastic, namely $$P\mathbf{1}=\mathbf{1}$$ where $$\mathbf{1}$$ represent the vector of ones. The basic spectral property of the transition matrix is that all its complex eigenvalues lie in the ball $$\{x\in\mathbb{C}\text{ : }\Vert x\Vert_2\leq 1\}$$ regardless of the associated graph [24]. This property combined with the fact that the transition matrix of a block-cycle is block circulant [25] makes it possible to prove the following lemma2. We make the assumption that $$d_i^{out}>0$$ for any node $$i$$. Lemma 3.1 Let $$G=(V,E,W)$$ be a block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$. Then $$\lambda_l=e^{-2\pi i\frac{l}{k}}\in spec(P)$$ for all $$0\leq l\leq k-1$$, namely there are $$k$$ eigenvalues located on a circle centered at the origin and with radius $$1$$ in the complex plane. The right eigenvector associated to the eigenvalue $$e^{-2\pi i\frac{l}{k}}$$ is   $$u^l_j=\left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}e^{2\pi i\frac{lk}{k}} & j\in V_1\\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l(k-1)}{k}} & j\in V_2\\ \vdots & \\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l}{k}} & j\in V_k \end{array}.\right.$$ (3.3) for which $$Pu^l=\lambda_lu^l$$ and $$\Vert u_l\Vert_2=1$$. Moreover, if $$G$$ is strongly connected, then the eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$ have multiplicity $$1$$ and all other eigenvalues of $$P$$ have a modulus strictly lower than $$1$$. We refer to these eigenvalues and eigenvectors as the cycle eigenvalues and the right cycle eigenvectors. This spectral property is illustrated in Fig. 2b displaying the eigenvalues of the transition matrix of a block-cycle of eight blocks. Hence, computing the cycle eigenvalues and storing the corresponding right eigenvectors of $$P$$ as the columns of a matrix $$U\in\mathbb{C}^{n\times k}$$, the rows $$\{x^j\text{ : }1\leq j\leq n\}$$ define vector representations of the nodes of $$G$$. To recover the $$k$$ blocks of the block-cycle, we observe that the embeddings $$\{x^j\text{ : }j\in V^s\}$$ of the nodes in block $$V^s$$ are identical, being all equal to   $$c^s=\frac{1}{\sqrt{n}}\left[1,e^{\frac{2\pi i}{k}(k-s+1)},e^{\frac{4\pi i}{k}(k-s+1)},\ldots,e^{\frac{2\pi i}{k}(k-1)(k-s+1)}\right]\!.$$ (3.4) We refer to the set of vectors $$\{c^s\text{ : }0\leq s\leq k-1\}$$ as block centroids of the block-cycle. In the presence of perturbing edges, the vectors $$\{x^j\text{ : } j\in V^s\}$$ are no longer identical but they form a cluster around the block centroid $$c^s$$. Hence, a way to recover the blocks of a perturbed block-cycle is to cluster the vectors $$\{x^j\text{ : }1\leq j\leq n\}$$ by assigning each node $$j$$ to the nearest block centroid, namely   $$\hat{\tau}(j)\leftarrow \underset{s}{\text{argmin}}\Vert x^j-c^s\Vert_2$$ (3.5) where $$\hat{\tau}(j)$$ is the estimated block assignment of node $$j$$. Theorem 3.2 justifies the approach by quantifying the effect of additive perturbations on the spectrum of the transition matrix of a block-cycle: starting with an unperturbed block-cycle $$G$$, we consider a graph $$\tilde{G}$$ obtained by appending perturbing edges to $$G$$ and we provide upper bounds on the perturbation of cycle eigenvalues, right cycle eigenvectors and node embeddings $$\{x^j\text{ : }1\leq j\leq n\}$$. These bounds are further used to provide a sufficient condition on the perturbation (Equation 3.12, further) for the method to succeed in recovering the blocks of nodes3. Theorem 3.2 Let $$G=(V,E,W)$$ be a strongly connected block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$, let $$\lambda_0,\ldots,\lambda_{k-1}$$ be the $$k$$ cycle eigenvalues and $$u^0,\ldots,u^{k-1}$$ be the corresponding right cycle eigenvectors. Let the $$\hat{G}=(V,\hat{E},\hat{W})$$ be a perturbed version of $$G$$ formed by appending positively weighted edges to $$G$$ except self-loops. Let $$P$$ and $$\hat{P}$$ denote the transition matrices of $$G$$ and $$\hat{G}$$ respectively. We define the quantities   $$\begin{array}{rcl} \sigma &=&\underset{(i,j)\in\hat{E}}{\max}\text{ }\frac{\hat{d}_j^{in}}{\hat{d}_i^{out}}\\ \rho &=&\underset{i}{\max}\text{ }\frac{\hat{d}_i^{out}-d_i^{out}}{d_i^{out}} \end{array}$$ (3.6) where $$d^{in}_i$$, $$d^{out}_i$$, $$\hat{d}^{in}_i$$ and $$\hat{d}^{out}_i$$ represent the in-degree and out-degree of $$i$$-th node in $$G$$ and $$\hat{G}$$, respectively. Then, 1. for any cycle eigenvalue $$\lambda_l\in spec(P)$$, there exists an eigenvalue $$\hat{\lambda}_l\in spec(\hat{P})$$ so that   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (3.7) where $$f$$ is the (left) Perron eigenvector of $$P$$, namely the vector $$f$$ verifying $$f^TP=f^T$$ with $$f^T\mathbf{1}=1$$, 2. there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ (i.e. for which $$\hat{P}\hat{u}^l=\hat{\lambda}^l\hat{u}^l$$) verifying   $$\Vert\hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (3.8) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$, 3. the node embeddings $$\{x^1,\ldots,x^n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}^1,\ldots,\hat{x}^n\}\subset\mathbb{C}^k$$ defined as the rows of the right eigenvector matrices $$U=[u^1,\ldots,u^k]\in\mathbb{C}^{n\times k}$$ and $$\hat{U}=[\hat{u}^1,\ldots,\hat{u}^k]\in\mathbb{C}^{n\times k}$$, respectively, verify   $$\Vert x^j - \hat{x}^j\Vert_2 \leq k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right)$$ (3.9) for each $$j\in\{1,\ldots,n\}$$. In addition to Theorem 3.2, Lemma 3.2 shows that the block centroids of a block-cycle are equidistant4, with a pairwise Euclidean distance equal to $$\sqrt{2k/n}$$. Lemma 3.2 The $$k$$ distinct row vectors $$c^0,\ldots,c^{k-1}$$ that constitute the set of the rows of the eigenvector matrix $$U$$ in the case of an unperturbed block-cycle (defined by Equation 3.4) are pairwise equidistant and, for all $$r\neq s\in\{0,\ldots,k-1\}$$,   $$\Vert c^r-c^s\Vert_2=\sqrt{\frac{2k}{n}}$$ (3.10) Lemma 3.2 combined with the third claim of Theorem 3.2 provides a sufficient condition for the success of the method expressed by Equation 3.5 that assigns each node to the block with nearest associated centroid. Indeed, to ensure that each node vector $$\hat{x}^j$$ is located closer to its block centroid $$c^{\tau(j)}$$ than to other block centroids, and if we neglect higher order terms in Equation 3.9, a sufficient condition is to ensure that the distance $$\Vert x^j-\hat{x}^j\Vert_2$$ does not exceed half of the pairwise distance $$\sqrt{2k/n}$$ between the centroids, namely   $$k\sqrt{2\sigma\rho} \underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2\leq \frac{1}{2}\sqrt{\frac{2k}{n}}$$ (3.11) or equivalently   $$\rho\leq \frac{1}{4kn\sigma}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}.$$ (3.12) If the relative perturbation $$\rho$$ on the out-degrees does not exceed the bound expressed by the right-hand side of the inequality, the success of the clustering method expressed by Equation 3.5 is guaranteed. We make a few comments about the different quantities involved in the perturbation bounds of Theorem 3.2 and the sufficient condition expressed by Equation 3.12. The quantities defined as $$\sigma$$ and $$\rho$$ both depend on the presence of perturbing edges. The $$\sigma$$ quantity measures the discrepancy between in-degree of destination and out-degree of origin for all edges in the perturbed graph and hence it is greater than $$1$$. It is close to $$1$$ in the particular case of a perturbed block-cycle with a uniform perturbation5 and if the block-cycle has homogeneous degrees6. The quantity defined as $$\rho$$ measures the relative discrepancy between out-degrees in the presence and in the absence of perturbation. In the particular case of a block-cycle with homogeneous out-degrees approximately equal to $$d^{out}$$, and a uniform perturbation of $$\alpha |E|$$ edges in total (for some $$\alpha>0$$), corresponding to $$\alpha d^{out}$$ perturbing out-edges for each node, we have $$\rho\simeq \alpha$$. In that particular case, Equation 3.12 states that the method is successful for a relative perturbation magnitude $$\alpha$$ satisfying   $$\frac{|\tilde{E}|}{|E|}=\alpha\leq \frac{1}{4kn}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}.$$ (3.13) where $$\tilde{E}$$ represents the perturbing edges and $$E$$ represents the edges in the original block-cycle. In the first claim of Theorem 3.2 regarding the perturbation of the cycle eigenvalues, $$f$$ is the Perron eigenvector [26] with unit $$1$$-norm (i.e. the real valued vector with nonnegative entries and verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$). Thus $$\frac{1}{\sqrt{n}}\leq \Vert f\Vert_2\leq 1$$ and $$\Vert f\Vert_2=\frac{1}{\sqrt{n}}$$ when it is constant, namely when the stationary probability of the random walk associated to $$P$$ is uniform over the vertices. This is the case for instance for a block-cycle generated by a stochastic blockmodel with similar probabilities of transitions between any pair of consecutive blocks in the block-cycle. Regarding the perturbation of right eigenvectors and node embeddings (second and third claims of Theorem 3.2), Inequality 3.8 follows from the fact that the cycle eigenvalues are simple. Providing bounds on the norm of the Drazin generalized inverse of a non-symmetric matrix is tedious since it depends on the condition number of each eigenvalue of the matrix (see for instance [27] for bounds on the norm of $$(I-P)^{\#}$$ for a stochastic matrix $$P$$). However, based on [28], we show in Appendix A.2 that, when $$P$$ is diagonalizable, the norm of the Drazin generalized inverse of $$(\lambda_lI-P)$$ verifies   $$\Vert (\lambda_lI-P)^{\#}\Vert_2\leq (n-1)\left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1} \Vert Y\Vert_2$$ (3.14) where $$\{u^r\text{ : }1\leq r\leq n\}$$ and $$\{y^r\text{ : }1\leq r\leq n\}$$ represent the right and left eigenvectors7 of $$P$$ such that $$\Vert u^1\Vert_2=...=\Vert u^n\Vert_2=1$$ and $$(y^1)^Hu^1=...=(y^n)^Hu^n=1$$, and $$Y=[y^1,...,y^{l-1},y^{l+1},...,y^n]$$ (concatenation of left eigenvectors). For the particular case of a block-cycle, whenever the perturbation is low8 and the number of blocks is greater than $$7$$, we show in Appendix A.2 that the eigenvalue closest to a cycle eigenvalue is the closest other cycle eigenvalue, hence   $$\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|=2\sin\left(\frac{\pi}{k}\right)\!.$$ (3.15) Moreover the empirical observations described in Appendix A.2 show that   $$\Vert Y\Vert_2\lessapprox (n-1)\sqrt{n}\Vert f\Vert_2.$$ (3.16) Combining the results of Equations 3.14, 3.15 and 3.16, we conclude the following. The perturbation bounds expressed by Equations 3.7, 3.8 and 3.9 in Theorem 3.2 are small for low values of $$k$$ and for low values of $$\sigma$$, $$\rho$$ and $$\Vert f\Vert_2$$. For instance, this is typically the case when the perturbing edges are uniformly distributed across the graph and when the block-cycles consist of balanced blocks with similar numbers of edges among the consecutive blocks of the block-cycle. Similarly, the sufficient condition bounding the maximum allowed perturbation on the out-degrees of the nodes (inequality 3.12) is looser when $$\sigma$$ and $$\Vert f\Vert_2$$ are low. For a uniformly perturbed block-cycle with balanced blocks and connections, the sufficient condition becomes inequality 3.13 which restricts the allowed number of perturbing edges. In practice, experiments on synthetic graphs described in Section 6, show that the method is also able to identify the blocks when the perturbation is structured and not uniform (e.g. generated by a stochastic block-model with non-uniform probabilities of connection) and when the degrees of the nodes of the block-cycle are not homogeneous. In general, Equation 3.8 implies that the right cycle eigenvectors of a block-cycle vary continuously as functions of the entries of the transition matrix and hence of the edges’ weights. Although the bounds provided by Theorem 3.2 can be quite loose in practice, the continuity of the cycle eigenvalues and the right cycle eigenvectors is verified for any strongly connected block-cycle. This continuity property provides the theoretical foundation of our spectral clustering algorithm. It is worth mentioning that the eigenvectors of interest in classical spectral clustering also vary continuously as functions of the entries of the Laplacian matrix of an undirected graph, which also provides a theoretical justification for the classical spectral clustering [13]. 3.2 Spectral clustering algorithm We now introduce our spectral clustering algorithm in a formal way. As mentioned, in the presence of perturbing edges satisfying the sufficient condition 3.12, when extracting right eigenvectors associated to the $$k$$ eigenvalues with largest modulus of the transition matrix, entries of these eigenvectors tend to form clusters around the block centroids. Hence, the blocks can be recovered by clustering the node embeddings defined by the right cycle eigenvectors. We provide two additional details about this clustering process. First, instead of assigning each node to the block $$s$$ corresponding to the nearest block centroid $$c^s$$ as it was suggested in last section, we apply the k-means clustering algorithm to the node embeddings $$\{x^j\text{ : }1\leq j\leq n\}$$ which gives better results in practice since it updates the centroid positions in the clustering process. Second, we show that not all the $$k$$ cycle eigenvalues and right cycle eigenvectors must be computed. Indeed, from Lemma 3.1, we observe that the right cycle eigenvector associated to eigenvalue $$\lambda_0=1$$ is the constant vector $$\frac{\mathbf{1}}{\sqrt{n}}$$ which does not provide any information about the blocks. Moreover, the other right cycle eigenvectors form complex conjugate pairs. Hence, to avoid collecting redundant or irrelevant information, it suffices to find the $$\lfloor\frac{k}{2}\rfloor$$ cycle eigenvalues in   $$\{\lambda\in\mathbb{C}\text{ : }\text{Re}(\lambda)<1\text{, }\text{Im}(\lambda)\geq 0\}$$ (3.17) with largest modulus, and the associated right cycle eigenvectors. These eigenvectors are stored as the columns of a matrix $$\Gamma\in\mathbb{C}^{n\times \lfloor\frac{k}{2}\rfloor}$$ whose rows are clustered using the k-means algorithm. We define Algorithm 3.3 as the Block-Cyclic Spectral (BCS) clustering algorithm. Algorithm 3.3 (Block-Cyclic Spectral (BCS) clustering algorithm) Input: Adjacency matrix $$W\in\{0,1\}^{n\times n}$$ in which all nodes have nonzero out-degree; Parameters:$$k\in\{2,3,\ldots,n\}$$; 1. Compute the transition matrix $$P$$; 2. Find the $$\lfloor\frac{k}{2}\rfloor$$ eigenvalues $$\{\lambda_1,\ldots,\lambda_{\lfloor\frac{k}{2}\rfloor}\}$$ of $$P$$ with largest modulus and lying in $${\{\lambda\in\mathbb{C}\text{ : }\text{Re}(\lambda)<1\text{, }\text{Im}(\lambda)\geq 0\}}$$ and the associated right cycle eigenvectors verifying $$\{u^1,\ldots,u^{\lfloor\frac{k}{2}\rfloor}\}$$. Store the eigenvectors as the columns of a matrix $$\Gamma\in \mathbb{C}^{n\times \lfloor\frac{k}{2}\rfloor}$$; 3. Consider each row of $$\Gamma$$ as a point in $$\mathbb{C}^{\lfloor\frac{k}{2}\rfloor}$$ and find k clusters of these points using a k-means algorithm. Let $$\phi:\{1,\ldots,n\}\rightarrow\{1,\ldots,k\}$$ be the function assigning each row of $$\Gamma$$ to a cluster; 4. Compute the estimation of block membership function $$\hat{\tau}$$: $$\hat{\tau}(u)=\phi(u)$$ for all $$u\in\{1,\ldots,n\}$$; Output: estimation of block membership $$\hat{\tau}$$; We now make some observations about BCS clustering algorithm. Step 2 of the algorithm involves the computation of eigenvalues of a non-symmetric matrix. If the number $$n$$ of nodes is small, the QR algorithm can be used for the computation of the cycle eigenvalues and right cycle eigenvectors with a time complexity of $$\mathscr{O}(n^3)$$ iterations. When the number $$n$$ of nodes is large, an iterative method (e.g. the Arnoldi iteration [29]) can be applied since the cycle eigenvalues are extremal: the cost for the computation of a cycle eigenpair $$(\lambda_l,u^l)$$ using an iterative method is $$\mathscr{O}(|E|\Vert (P-\lambda_l I)^{\#}\Vert_2)$$ where $$|E|$$ is the number of edges in the graph and $$\Vert (P-\lambda_l I)^{\#}\Vert_2$$ is the norm of the generalized inverse of $$(P-\lambda_lI)$$9. However, from Equation 3.14 and, as can be seen in the practical applications of Section 7, the gap between each cycle eigenvalue and other eigenvalues is generally large and $$\Vert (P-\lambda_l I)^{\#}\Vert_2$$ is small. In this case, the cost for the computation of the cycle eigenvalues and right cycle eigenvectors is linear in the number $$|E|$$ in the graph. We refer to [30] for advanced eigensolvers for non-symmetric matrices. The k-means method at step 3 can be Lloyd’s algorithm [31]. Regarding the dimensionality of the complex node embeddings defined by the cycle eigenvector, it is $$2\lfloor\frac{k}{2}\rfloor=k-1$$ when $$k$$ is odd and $$1+2(\frac{k}{2}-1)=k-1$$ when $$k$$ is even (since one of the extracted eigenpairs is real). Hence, the node embeddings are treated as $$(k-1)$$-dimensional vectors by Lloyd’s algorithm, the time complexity of which is thus $$\mathscr{O}(k^2nI)$$ where $$I$$ is the maximum allowed number of iterations. In practice, the convergence of Lloyd’s algorithm is fast in comparison with that of iterative methods for finding eigenvalues of non-symmetric matrices. Hence, in practice, the overall complexity of BCS clustering algorithm is linear in the number of edges in the graph. From Lemma 3.1, $$e^{\frac{2\pi i}{k}}$$ is an eigenvalue of the transition matrix of a block-cycle and the components of the associated right eigenvector are $$u_j=\frac{1}{\sqrt{n}} e^{2\pi i\frac{l-1}{k}}\text{, if node }j\text{ is in block }l$$. Hence, an alternative to Algorithm 3.3 is to extract this right eigenvector only and cluster its components in $$\mathbb{C}$$ to recover the block membership of nodes. However, as shown in experiments presented in Section 6, Algorithm 3.3 turns to be more robust to perturbations for a similar time complexity when the number $$k$$ of blocks is small. Indeed, as the cycle eigenvalues are extremal eigenvalues, the cost of an iterative computation of the eigenvalues does not differ significantly for the extraction of one or all cycle eigenvalues. 3.3 Finding the number of blocks We now discuss the issue of finding a suitable value for the number $$k$$ of blocks. In the classical spectral clustering algorithm, the number $$k$$ of clusters is generally left as a parameter [13]. For some specific applications such as the trophic network and the network of Autonomous Systems presented in Section 7, the number of blocks is known (based on general knowledge about the networks). However, when such knowledge is not available, we suggest the following heuristic (Algorithm 3.4) which is based on a slight variant of the average silhouette method [33], which was already successfully combined with algorithms such as the k-means for finding the approximate number of clusters when clustering data points in a Euclidean space. Given a maximal number of potential clusters $$k^{\max}$$, Algorithm 3.4 computes the $$\lfloor\frac{k^{\max}}{2}\rfloor$$ eigenvectors of $$P$$ of interest and orders them in decreasing order of the module of the corresponding eigenvalues. Then, in an iterative fashion, it extracts the $$k$$ first eigenvectors for growing values of $$k$$ and clusters the rows of the corresponding matrix. An average silhouette score $$\zeta^k\in [0,1]$$ is computed for each value of $$k$$. If the data samples form $$k$$ cohesive clusters, $$\zeta^k$$ is expected to be close to $$1$$. The value of $$k$$ achieving the maximum score is returned. To speed up Algorithm 3.4, a slight modification is to only extract the right eigenvector $$u^1$$ associated to the eigenvalue $$\lambda^*$$ in $$R=\{\lambda\in\mathbb{C}\text{ : }\text{Re}(\lambda)<1\text{, }\text{Im}(\lambda)\geq 0\}$$, such that   $$\lambda^*=\underset{\substack{\lambda\in\text{spec}(P)\cap R\\\text{Im}(\lambda)\geq 0}}{\rm argmin}|\lambda-1|$$ (3.18) which is expected to correspond to the eigenvalue $$\lambda_1=e^{\frac{2\pi i}{k}}$$. Although the entries of eigenvector $$u^1$$ do form clusters corresponding to the blocks of nodes, this method for finding $$k$$ is less robust to perturbations as shown in experiments presented in Section 6.6. 4. Spectral properties of nested block-cycles In this section, we provide an empirical analysis of an extension of block-cycles. The spectral properties of so-called nested block-cycles provide the basis for a clustering algorithm for block-acyclic graphs presented in Section 5. The formal definition of a nested block-cycle is given below. Definition 4.1 (Nested block-cycle) A directed graph $$G=(V,E,W)$$ is a nested block-cycle of $$k$$ blocks if there exists a function $$\tau:V\rightarrow\{1,\ldots,k\}$$ partitioning the nodes of $$V$$ into $$k$$ non-empty blocks, such that   $$E\subseteq\{(u,v)\text{ : }\tau(u)<\tau(v)\text{ or }\tau(u) = k\}.$$ (4.1) An example of nested block-cycle of four blocks is given in Fig. 3. In such graph, the $$l$$-th block may be connected to blocks $$l+1,\ldots,k$$ for $$l<k$$ and the $$k$$-th block may be connected to all other blocks. Fig. 3. View largeDownload slide Nested block-cycle of four blocks (nodes represent blocks). Fig. 3. View largeDownload slide Nested block-cycle of four blocks (nodes represent blocks). We next provide an empirical analysis of the spectrum of the transition matrix of a nested block-cycle and verify to what extent properties of pure block-cycles are preserved in nested block-cycles. As a nested block-cycle may be aperiodic, its spectrum no longer contains $$k$$ eigenvalues with unit modulus. However, experiments allowed us to make the following observations: when a nested block-cycle of $$k$$ blocks contains a block-cycle of $$k$$ blocks as a subgraph that covers all nodes and that verifies all assumptions of Lemma 3.1 then 1. there are $$k$$ outlying eigenvalues in the spectrum of the transition matrix of $$G$$, namely $$k$$ eigenvalues with a significantly larger modulus compared with other eigenvalues, 2. the corresponding right eigenvectors verify the same clustering property as the right cycle eigenvectors of a block-cycle. First, to support the first observation above, we present the following experimental results. Starting with a pure block-cycle of $$k=4$$ blocks, in which nodes are randomly assigned to blocks and the probability of existence of edges in the block-cycle is $$0.1$$, we randomly append edges satisfying the definition of a nested block-cycle which transforms the block-cycle into a nested block-cycle as the one depicted in Fig. 3. Figure 4 displays the evolution of the spectrum of the transition matrix as edges are appended to the block-cycle to create a nested block-cycle (we display all the eigenvalues of all transition matrices of the resulting nested block-cycles in the same figure). Although the spectrum is strongly perturbed compared to a pure block-cycle, there are four outlying eigenvalues regardless of the magnitude of the perturbation which confirms the first claim. By analogy with the block-cyclic case, we refer to these eigenvalues as cycle eigenvalues. Figure 5 shows the proportion of misclassified nodes when applying BCS clustering algorithm to the resulting nested block-cycle as a function of the number of appended edges. We also display the proportion of misclassified nodes when the same number of edges are randomly appended. While the error seems to grow linearly for random perturbations, the error stays close to zero for the nested block-cycle no matter the magnitude of the perturbation which supports the second claim above. Fig. 4. View largeDownload slide Eigenvalues of the transition matrices of nested block-cycles of four blocks obtained by appending edges iteratively to a block-cycle of four blocks. The eigenvalues of all the resulting nested block-cycles are displayed together in the graph above. In each case, there are four eigenvalues with a significantly large modulus. Fig. 4. View largeDownload slide Eigenvalues of the transition matrices of nested block-cycles of four blocks obtained by appending edges iteratively to a block-cycle of four blocks. The eigenvalues of all the resulting nested block-cycles are displayed together in the graph above. In each case, there are four eigenvalues with a significantly large modulus. Fig. 5. View largeDownload slide Evolution of the proportion of misclassified nodes as a function of the proportion of perturbing edges when applying BCS clustering algorithm respectively to a nested block-cycle and a randomly perturbed block-cycle of four blocks. The proportion of perturbing edges is the ratio between the number of appended edges and the number of edges in the original block-cycle. Fig. 5. View largeDownload slide Evolution of the proportion of misclassified nodes as a function of the proportion of perturbing edges when applying BCS clustering algorithm respectively to a nested block-cycle and a randomly perturbed block-cycle of four blocks. The proportion of perturbing edges is the ratio between the number of appended edges and the number of edges in the original block-cycle. In addition to the above empirical results regarding the spectrum of nested block-cycles, we also prove that there are $$k$$ outlying eigenvalues in the specific case of a nested block-cycle with equally sized blocks and with all possible edges between the blocks. Such nested block-cycle is referred to as a complete nested block-cycle since all edges allowed by Definition 4.1 are present in the graph10. Lemma 4.1 states that they are at least $$k$$ eigenvalues of modulus greater than $$\frac{1}{2k-1}$$ for the transition matrix of such nested block-cycle. Lemma 4.1 For a nested block-cycle $$G=(V,E)$$ of $$k$$ blocks of equal size, if the nested block is complete, then there exist at least $$k$$ eigenvalues $$\lambda_1,\ldots,\lambda_k$$ of the transition matrix $$P$$ such that   $$\vert \lambda_l\vert \geq \frac{1}{2k-1}$$ (4.2) for each $$l\in\{1,\ldots,k\}$$. Lemma 4.1 implies that there exist $$k$$ outlying eigenvalues for any complete nested block-cycle of equally sized blocks. The larger $$k$$, the lower the bound on the modulus of these $$k$$ eigenvalues. An empirical analysis confirms that the $$k$$ eigenvalues get indeed closer to $$0$$ for growing values of $$k$$. When the nested block-cycle is not complete, empirical tests such as the one reported by Fig. 4, show that there are still $$k$$ outlying eigenvalue and that the eigenvalues generally tend to decrease in magnitude when edges are appended to the graph. Moreover, as seen on Fig. 4, the conclusion is also valid when the blocks are not equally sized. Experiments in Section 6 show indeed that our BCS clustering algorithm is also capable of detecting the blocks of a nested block-cycle when the blocks have imbalanced sizes. Second, regarding the sensitivity of these cycle eigenvalues to perturbing edges and the clustering properties of the entries of the corresponding right cycle eigenvectors, we discuss to what extent the three claims of Theorem 3.2 are verified for nested block-cycles. If we define $$\sigma$$ and $$\rho$$ in the same way as in Theorem 3.2, and if the $$k$$ cycle eigenvalues are $$\{\lambda_0,\ldots,\lambda_{k-1}\}$$ and the corresponding right and left eigenvectors are $$\{u^0,\ldots,u^{k-1}\}$$ and $$\{y^0,\ldots,y^{k-1}\}$$, respectively, such that $$\Vert u^0\Vert_2=...=\Vert u^{k-1}\Vert_2=1$$ and $$(y^0)^Hu^0=...=(u^{k-1})^Hy^{k-1}=1$$, then, by extending the proof of Theorem 3.2 (found in appendix), one can show that a perturbed version $$\hat{G}$$ of a nested block-cycle $$G$$, obtained by appending perturbing edges to $$G$$ has the following properties: 1. for any cycle eigenvalue $$\lambda_l\in\text{spec}(P)$$, there exists an eigenvalue $$\hat{\lambda}_l\in\text{spec}(\hat{P})$$ such that   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2\sigma\rho}\Vert y_l\Vert_2+\mathscr{O}\left(\sigma\rho\right)$$ (4.3) where $$P$$ and $$\hat{P}$$ are the transition matrices of $$G$$ and $$\hat{G}$$, respectively, 2. there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ verifying   $$\Vert\hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (4.4) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$, 3. the node embeddings $$\{x_1,\ldots,x_n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}_1,\ldots,\hat{x}_n\}\subset\mathbb{C}^k$$ defined as the rows of the right eigenvector matrices $$U=[u^1,\ldots,u^k]\in\mathbb{C}^{n\times k}$$ and $$\hat{U}=[\hat{u}^1,\ldots,\hat{u}^k]\in\mathbb{C}^{n\times k}$$, respectively verify   $$\Vert x^j - \hat{x}^j\Vert_2 \leq k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right)$$ (4.5) for each $$j\in\{1,\ldots,n\}$$. The comments regarding the impact of quantities $$\sigma$$ and $$\rho$$ on the perturbation of the spectrum of a block-cycle are still valid in the case of a nested block-cycle. In the particular case of a block-cycle with homogeneous degrees and a uniform perturbation, $$\sigma\simeq 1$$ and $$\rho$$ corresponds to the relative magnitude of the perturbation11. However, since no closed form solution are available for the right and left cycle eigenvectors of a nested block-cycle, Equation 4.3 cannot be further transformed. Moreover, although the theoretical bound on $$\Vert (\lambda_lI-P)^{\#}\Vert_2$$ provided in Equation 3.14 is still valid (Section A.2), Equation 3.15 providing an empirical analysis of the distance between the cycle eigenvalues for block-cycles is no longer verified for nested block-cycles. Experiments presented in Appendix A.4 analyse the evolution of both the norm of the left cycle eigenvectors $$\Vert y^l\Vert_2$$ and the norm of the Drazin generalized inverse $$\Vert (\lambda_lI-P)^{\#}\Vert_2$$ when edges are appended to a block-cycle to turn it into a nested block-cycle. 5. Spectral clustering algorithm for block-acyclic graphs In this section, we describe an algorithm for the extraction of blocks of vertices in block-acyclic graphs. This algorithm referred to as Block-Acyclic Spectral (BAS) clustering algorithm is based on the spectral properties of nested block-cycles described in Section 4, and it is largely similar to BCS clustering algorithm. A block-acyclic graph is a directed graph where vertices can be partitioned into blocks with an acyclic pattern of connections between the blocks. We provide a formal definition of block-acyclic graph below. Definition 5.1 (Block-acyclic graph) A directed graph $$G=(V,E,W)$$ is a block-acyclic graph of $$k$$ blocks if there exists a function $$\tau:V\rightarrow\{1,\ldots,k\}$$ partitioning the nodes of $$V$$ into $$k$$ non-empty blocks, such that   $$E\subseteq\{(u,v)\text{ : }\tau(u)<\tau(v)\}.$$ (5.1) The block membership function $$\tau$$ can be viewed as a ranking function such that any edge of the graph has its origin in a block of strictly lower rank than its destination. Figure 1a displays an example of block-acyclic graph. The adjacency matrix of a block-acyclic graph is strictly block upper triangular as depicted in Fig. 6a. As in the case of a block-cycle, it is worth mentioning that a block-acyclic graph is unlikely to derive from a stochastic blockmodel as our definition does not include any regularity requirement on the degrees of nodes within blocks. Fig. 6. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-acyclic graph of eight blocks. Fig. 6. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-acyclic graph of eight blocks. Our goal is to detect blocks of nodes in perturbed block-acyclic graphs, in which the perturbation takes the form of edges that do not follow the block-acyclic pattern of connections (as was assumed for perturbed block-cycles). The principle of our method is to append a few edges in order to transform the block-acyclic graph into a nested block-cycle with the same blocks of nodes. We then apply BCS algorithm to this graph to recover the blocks. The following transformation is performed on the block-acyclic graph. If a node is out-isolated (zero out-degree), we artificially add out-edges connecting this node to all other nodes of the graph. In this way, we append edges connecting the block of highest rank back to all other blocks. The resulting graph is a nested block-cycle and, as shown in Section 4, BCS clustering algorithm is able to recover blocks of nodes in such graph. Again, we make use of the transition matrix of this modified graph, we denote this matrix by $$P_a$$.   $$(P_a)_{ij}=\left\{ \begin{array}{ll} \frac{1}{d_i^{out}}W_{ij} & \text{ if }d_i^{out}>0\\ \frac{1}{n} & \text{ otherwise } \end{array}.\right.$$ (5.2) We note that this matrix is the transpose of the Google matrix with zero damping factor [34]. Matrix $$P_a$$ is the transition matrix of a nested block-cycle. Hence, as mentioned in Section 4, it has a spectral property similar to the one highlighted in the case of block-cycles: provided that, after appending out-edges to out-isolated nodes, the graph is strongly connected and provided that each node in block $$l<k$$ is connected to at least one node in block $$l+1$$, then $$k$$ eigenvalues of the transition matrix have a modulus significantly larger than the moduli of other complex eigenvalues, as shown in Fig. 6b. Based on this observation, we formulate BAS clustering algorithm (Algorithm 5.2) similar to BCS clustering algorithm. Both algorithms have approximately the same time and space complexity. As in the case of BCS clustering algorithm, we only compute the $$\lfloor\frac{k}{2}\rfloor$$ cycle eigenvalues in   $$\{\lambda\in\mathbb{C}\text{ : }\text{Re}(\lambda)<1\text{, }\text{Im}(\lambda)\geq 0\}$$ (5.3) with largest modulus, and the associated right cycle eigenvectors. As in the case of block-cycles, an extension similar to Algorithm 3.4 can also be proposed for finding the number of blocks in a block-acyclic graph. The process involves computing $$\lfloor \frac{k^{\max}}{2}\rfloor$$ eigenvectors of $$P^a$$ and iteratively clustering the entries of $$k\leq k^{\max}$$ eigenvectors in decreasing order of the modulus of the corresponding eigenvalues. The optimal number of clusters is the one achieving the lowest average silhouette value as expressed in Algorithm 3.4. The efficiency of the criterion is tested in Section 6.6. Algorithm 5.2 (Block-Acyclic Spectral (BAS) clustering algorithm) Input: Adjacency matrix $$W\in\{0,1\}^{n\times n}$$; Parameters:$$k\in\{2,3,\ldots,n\}$$ 1. Compute the transition matrix $$P_a$$; 2. Find the $$\lfloor\frac{k}{2}\rfloor$$ eigenvalues $$\{\lambda_1,\ldots,\lambda_{\lfloor\frac{k}{2}\rfloor}\}$$ of $$P_a$$ with largest modulus and lying in $$\{\lambda\in\mathbb{C}\text{ : }\text{Re}(\lambda)<1\text{, }\text{Im}(\lambda)\geq 0\}$$, and the associated right cycle eigenvectors verifying $$Pu^l=\lambda_lu^l$$. Store the eigenvectors as the columns of a matrix $$\Gamma\in \mathbb{C}^{n\times \lfloor\frac{k}{2}\rfloor}$$; 3. Consider each row of $$\Gamma$$ as a point in $$\mathbb{C}^{\lfloor\frac{k}{2}\rfloor}$$ and find k clusters of these points using a k-means algorithm. Let $$\phi:\{1,\ldots,n\}\rightarrow\{1,\ldots,k\}$$ be the function assigning each row of $$\Gamma$$ to a cluster; 4. Compute the estimation of block membership function $$\hat{\tau}$$: $$\hat{\tau}(u)=\phi(u)$$ for all $$u\in\{1,\ldots,n\}$$; Output: estimation of block membership $$\hat{\tau}$$ 6. Experiments In this section, we present the results of different experiments performed to validate the effectiveness of our algorithms. We first introduce the benchmark models used for the experiments. Then, the experiment of subsection 6.2 analyses the effect of perturbing edges on the cycle eigenvalues and right cycle eigenvectors and their impact on the quantities involved in the perturbation bounds of theorem 3.2. Then, in subsections 6.3, 6.4 and 6.5, competing methods of directed graph clustering are described, tested on our benchmark datasets and compared with our algorithms. 6.1 Benchmark model The graphs used for testing our algorithms are based on stochastic blockmodels. We conduct two experiments both on BCS and BAS clustering algorithms. For the first experiment, two stochastic blockmodels are defined to generate two graphs: a block-cycle and a perturbing graph. Edges from both graphs are combined to form a perturbed block-cycle. As said earlier in this article, a stochastic blockmodel is a model of random graph in which nodes within blocks are stochastically equivalent [8]. We formulate the following mathematical definition of stochastic blockmodel. Definition 6.1 (Stochastic Blockmodel) Given positive integers $$n$$ and $$k\leq n$$ and parameters $$\gamma\in [0,1]^k$$ and $$\Phi\in [0,1]^{k\times k}$$, an unweighted random graph $$G=(V,E)$$ of $$n$$ nodes is generated by the stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with block membership function $$\tau:\{1,\ldots,n\}\rightarrow \{1,\ldots,k\}$$ if $$P[\tau(u)=i]=\gamma_i$$$$\forall u\in V$$, $$i\in\{1,\ldots,k\}$$, $$P[(u,v)\in E|\tau(u)=s,\tau(v)=t]=\Phi_{st}$$$$\forall u,v\in V$$. When considering a graph $$G=(V,E)$$ generated by the stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with random block membership function $$\tau$$, we use the notation $$[G,\tau]\sim SBM(k,\gamma,\Phi)$$. The graphs used in our first experiment are based on the combination of two random graphs: an unperturbed graph $$G=(V,E)$$ with block membership function $$\tau$$ such that $$[G,\tau]\sim SBM(k,\gamma,\Phi)$$ where parameter $$\Phi$$ is chosen so that $$G$$ is either a block-cycle or a block-acyclic graph, and a perturbing graph $$\tilde{G}=(V,\tilde{E})$$ with block membership function $$\tilde{\tau}$$ such that $$[\tilde{G},\tilde{\tau}]\sim SBM(\tilde{k},\tilde{\gamma},\tilde{\Phi})$$. We combine $$G$$ and $$\tilde{G}$$ to create a perturbed graph $$H=(V,E\bigcup \tilde{E})$$ and apply BCS or BAS clustering algorithm to provide an estimation $$\eta$$ of the block membership function $$\tau$$. Parameter $$\tilde{\Phi}$$ determines the distribution of perturbing edges in graph $$H$$. We combine two stochastic blockmodels in such way to take into account the fact that in real-world applications, several complex models and phenomena may influence the shape of networks. This first test applied to both BCS and BAS clustering algorithms is intended to show how our algorithms perform on a standard network model in the presence of perturbation. Our second benchmark model highlights a case in which our algorithms are successful while other algorithms produce very poor results. We generate two different graphs based on the same stochastic blockmodel $$[G_1=(V_1,E_1),\tau_1]\sim SBM(k,\gamma,\Phi)$$ and $$[G_2=(V_2,E_2),\tau_2]\sim SBM(k,\gamma,\Phi)$$ where $$\Phi$$ is chosen so that the graphs generated are both block-cyclic or both block-acyclic. Then we combine both graphs by defining $$G=(V=V_1\bigcup V_2,E=E_1\bigcup E_2\bigcup E_{c})$$ where edges in $$E_{c}$$ are randomly selected according to the model   $$P[(u,v)\in E_c]=\left\lbrace \begin{array}{ll} \alpha \Phi_{\tau_1(u),\tau_2(v)} & \text{ if }u\in V_1\text{, }v\in V_2\\ \alpha \Phi_{\tau_2(u),\tau_1(v)} & \text{ if }u\in V_2\text{, }v\in V_1\\ 0 & \text{ otherwise } \end{array} \right.$$ (6.1) where $$\alpha\in [0,1]$$. If $$\alpha$$ is equal to $$1$$, then Model 6.1 corresponds to a standard stochastic blockmodel. If $$\alpha<1$$, the block shape of the adjacency matrix is preserved, but nodes within blocks are not stochastically equivalent. The graph is then perturbed in the same way as in the first experiment: a perturbing graph $$[\tilde{G},\tilde{\tau}]\sim SBM(\tilde{k},\tilde{\gamma},\tilde{\Phi})$$ of $$|V|$$ nodes is generated with $$P=\epsilon Q$$ and $$Q\in [0,1]^{k\times k}$$ with random entries. The graphs are combined to form a perturbed block cycle $$H=(V,E\bigcup \tilde{E})$$. This model illustrates the case where, although the graph might have a block-cyclic or a block-acyclic shape, nodes within a block have a highly different distribution of edges towards other blocks and hence are not stochastically equivalent. This type of framework is also observed in real-world datasets such as trophic networks modelling predator–prey relationships. For instance, two top-level predators are both at the top of the food chain but they might have highly different diets and have different preys at different levels of the food chain. Finally, we use the two following error measures to assess the quality of the result [8]. Definition 6.2 (Blockmembership error) Given a graph $$G=(V,E)$$ with a certain block membership function $$\tau:V\rightarrow\{1,\ldots,k\}$$ and a perturbed version $$H=(V,E_H)$$ of graph $$G$$, if $$\eta:V\rightarrow\{1,\ldots,k\}$$ is an estimation of the block membership function $$\tau$$ based on $$H$$, then the block membership error is   $$\frac{1}{|V|}\underset{\pi\in\Pi(k)}{\min}|\{u\in V\text{ : }\tau(u)\neq \pi(\eta(u))\}|$$ (6.2) where $$\Pi(k)$$ is the set of permutations on $$\{1,\ldots,k\}$$. In other words, the block membership error computes the minimum number of differences in the entries of $$\tau$$ and $$\eta$$ when considering all permutations of block labels $$\{1,\ldots,k\}$$. The computation of the block membership error can be formulated as a minimum matching problem which can be solved by the Hungarian algorithm [35]. We also compute an error based on the normalized mutual information (NMI) [36] defined below. The error measure associated to the NMI is $$1-NMI(\Omega,C)$$. Definition 6.3 (Normalized Mutual Information) Given ground-truth clusters $$C=\{c_1,\ldots,c_k\}$$ and estimations $$\Omega=\{\omega_1,\ldots,\omega_k\}$$ for a dataset consisting of $$n$$ data points, the normalized mutual information is defined as   $$NMI(\Omega,C)=\frac{2I(\Omega,C)}{H(\Omega)+H(C)}$$ (6.3) where $$I(\Omega,C)$$ is the mutual information between $$\Omega$$ and $$C$$ defined as   $$I(\Omega,C)=\sum_i\sum_j\frac{|\omega_j\cap c_i|}{n}\log\left(\frac{n|\omega_j\cap c_i|}{|\omega_j||c_i|}\right),$$ (6.4) $$H(C)$$ is the entropy of $$C$$ defined as   $$H(C)=-\sum_i\frac{|c_i|}{n}\log\left(\frac{|c_i|}{n}\right)$$ (6.5) and $$H(\Omega)$$ is the entropy of $$\Omega$$ (defined in the same way). 6.2 Perturbation analysis of block-cycles The following set of experiments provides an assessment of the effect of perturbing edges on the right cycle eigenvectors and the cycle eigenvalues of a block-cycle. The effect of perturbations on the quantities appearing in the perturbation bound of Theorem 3.2 (e.g. $$\rho$$, $$\sigma$$, etc.) is also assessed. Two types of block-cycles are considered: one with balanced blocks and homogeneous degrees, the other with imbalanced blocks and inhomogeneous degrees. 6.2.1 Block-cycle with balanced blocks and homogeneous degrees To generate the graphs, we apply the first benchmark model described in Section 6.1. A block-cycle $$[G=(V,E),\tau]\sim SBM(k,\gamma,\Phi)$$ is generated, with $$k=8$$ blocks, $$n=|V|=1000$$ nodes, $$\gamma=[1/8,\ldots,1/8]$$ and   $$\Phi_{st}=\left\lbrace\begin{array}{ll} 0.7 & \text{ if }(s,t)\in\{(1,2),(2,3),\ldots,(7,8),(8,1)\}\\ 0 & \text{ otherwise. } \end{array}\right.$$ (6.6) The perturbing graph is $$[\tilde{G}=(V,\tilde{E}),\tilde{\tau}]\sim SBM(k,\gamma,\tilde{\Phi})$$ where $$\tilde{\Phi}=\epsilon \mathbf{1}\mathbf{1}^T$$ where $$\mathbf{1}$$ is the vector of ones and $$\epsilon\in [0,1]$$ is a parameter controlling the magnitude of the perturbation. Since the blocks are balanced, since the probabilities of connection between consecutive blocks are equal and since the perturbation is uniform, the nodes of the perturbed block-cycle $$\hat{G}=(V,\hat{E}=E\cup\tilde{E})$$ have homogeneous in- and out-degrees. Hence, the theoretical perturbation bounds of Theorem 3.2 are low compared to an inhomogeneous block-cycle and, in particular, we expect $$\sigma\simeq 1$$, $$\rho$$ approximately equal to the proportion of perturbing edges and $$\Vert f\Vert_2\simeq \frac{1}{\sqrt{n}}$$. In the remainder of this section, the number of Arnoldi iterations applied for the computation of the eigenvalues (until convergence) varies between $$3000$$ and $$10\,000$$ and it tends to be higher when the perturbation increases. Figure 7a displays the evolution of the magnitude of the perturbation on the right eigenvectors (norm $$\Vert u^l-\hat{u}^l\Vert_2$$ of the difference between the right cycle eigenvectors $$u_l$$ and $$\hat{u}_l$$ of the original and the perturbed block-cycle, respectively) when increasing parameter $$\epsilon$$. The perturbations are displayed as functions of the relative number $$\frac{(|\hat{E}|-|E|)}{|E|}$$ of perturbing edges instead of $$\epsilon$$. It is observed that the perturbation on the right eigenvectors is limited for $$|E|^{-1}(|\hat{E}|-|E|)<4$$ for which $$\Vert u^l-\hat{u}^l\Vert_2\leq 0.5$$ for all $$l$$, while for $$|E|^{-1}(|\hat{E}|-|E|)\geq 4$$, the average of $$\Vert u^l-\hat{u}^l\Vert_2$$ over $$l$$ is above $$0.8$$. Moreover, we observe that the so-called first right cycle eigenvector $$u^1$$ defined by   $$u^1_j=\frac{1}{\sqrt{n}}e^{2\pi i\frac{l-1}{k}}\text{ if node }j\text{ is in block }l$$ (6.7) is subject to a maximal perturbation12. In Section 3, we mentioned the fact that, for an unperturbed block-cycle, computing $$u^1$$ only and clustering its entries in $$\mathbb{C}$$ is sufficient to recover the blocks. This eigenvector is however more sensitive to perturbations as can seen from Figure 7a. This motivates the use of $$\lfloor \frac{k}{2}\rfloor$$ right cycle eigenvectors as BCS clustering algorithm does. The upper bound on $$\Vert u^l-\hat{u}^l\Vert_2$$ provided in Theorem 3.2 is also displayed in Fig. 7a. Although the bound is not tight, it stays approximately within a factor $$10$$ of the average of $$\Vert u^l-\hat{u}^l\Vert_2$$ over $$l$$. Fig. 7. View largeDownload slide Evolution of the norm of the difference between the normalized right cycle eigenvectors of block-cycle $$G$$ and perturbed block-cycle $$\hat{G}$$ of $$1000$$ nodes (a) and of the difference between the cycle eigenvalues (b) as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. Fig. 7. View largeDownload slide Evolution of the norm of the difference between the normalized right cycle eigenvectors of block-cycle $$G$$ and perturbed block-cycle $$\hat{G}$$ of $$1000$$ nodes (a) and of the difference between the cycle eigenvalues (b) as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. Regarding the eigenvalues, Fig. 7b displays the evolution of the perturbation $$|\lambda_l-\hat{\lambda}_l|$$ of the cycle eigenvalues as a function of $$|E|^{-1}(|\hat{E}|-|E|)$$. For a given proportion of perturbing edges, the perturbation $$|\lambda_l-\hat{\lambda}_l|$$ is approximately the same for each cycle eigenvalue $$l$$ and it grows fast to reach a value above $$0.5$$ for $$|E|^{-1}(|\hat{E}|-|E|)>1$$. The theoretical bound provided by Theorem 3.2 stays approximately within a factor $$4$$ of the perturbation $$|\lambda_l-\hat{\lambda}_l|$$ on the eigenvalues. To further analyse the effect of the perturbation on the cycle eigenvalues and right cycle eigenvectors, we measure the evolution of the quantities involved in the perturbation bounds of Theorem 3.2. In Fig. 8a, we observe that parameter   $$\sigma=\underset{(i,j)\in\hat{E}}{\max}\text{ }\frac{\hat{d}_j^{in}}{\hat{d}_i^{out}}$$ (6.8) stays constant and approximately equal to $$1$$ regardless of the number of perturbing edges. This is coherent with what we announced in Section 3 for homogeneous and uniformly perturbed block-cycles. This observation also confirms the validity of Inequality 3.13 (sufficient condition for the recovery of the blocks of the block-cycle). We observe that $$\sigma$$ actually decreases slightly as the perturbation increases, since a uniform perturbation tends to favour the homogeneity of the in- and out-degrees of the nodes of the graph. Figure 8b displays the evolution of $$\rho$$. As can be seen, we have $$\rho\simeq |E|^{-1}(|\hat{E}|-|E|)$$ since the perturbation is uniform. Finally, defining $$R$$ as $$\hat{P}-P$$, namely the difference between the transition matrix of $$\hat{G}$$ and $$G$$ respectively, Fig. 8c displays the evolution of $$\Vert R\Vert_2$$ as a function of the relative number of perturbing edges. The curve is concave and monotonically increasing which is coherent with our proof of $$\Vert R\Vert_2\leq \sqrt{2\sigma\rho}$$ found in Appendix A.1. We observe overall that the curves of the perturbation on eigenvalues $$|\hat{\lambda}_l-\lambda_l|$$ and $$\Vert R\Vert_2$$ follow a similar pattern. Fig. 8. View largeDownload slide Evolution of quantities $$\sigma$$ (a), $$\rho$$ ((b) according to Theorem 3.2) and $$\Vert R\Vert_2$$ ((c) according to Appendix A.1) for the perturbed block-cycle $$\hat{G}$$ as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. Fig. 8. View largeDownload slide Evolution of quantities $$\sigma$$ (a), $$\rho$$ ((b) according to Theorem 3.2) and $$\Vert R\Vert_2$$ ((c) according to Appendix A.1) for the perturbed block-cycle $$\hat{G}$$ as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. Although the perturbation on the right cycle eigenvectors and cycle eigenvalues are large for $$|E|^{-1}(|\hat{E}|-|E|)>1$$, this does not prevent our BCS clustering algorithm from detecting the blocks in the perturbed block-cycle. Indeed, we observe in Fig. 9a that the block assignment error is close to zero for $$|E|^{-1}(|\hat{E}|-|E|)<8$$ (i.e. for $$\epsilon\leq 0.7$$). Despite the perturbation on the spectrum, the success of our BCS algorithm in the presence of a uniform perturbation is due to the fact that the rows of the matrix $$\hat{U}$$ of right cycle eigenvectors still cluster according to the true blocks as can be seen in Fig. 9b: the graph displays the Davies–Bouldin index [37] of the rows of the right cycle eigenvector matrix $$\hat{U}$$ when they are clustered according to the block-membership function $$\tau$$. The Davies–Bouldin index $$DB$$ measures the quality of the partitioning as   $$DB=\frac{1}{K}\sum\limits_{r=1}^K\underset{s\neq r}{\max}\frac{\delta_r+\delta_s}{d(c_r,c_s)}$$ (6.9) where $$\delta_r$$ represent the average Euclidean distance between the rows corresponding to the nodes in block $$r$$, and $$d(c_r,c_s)$$ is the distance between the centroids of blocks $$r$$ and $$s$$. The Davies–Bouldin index is low for a perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)<8$$, which suggests that the the blocks of the block-cycle define clusters of the rows of matrix $$\hat{U}$$ that are well cohesive. Hence, in spite of the perturbation on the spectrum of the transition matrix, the clustering property is preserved which allows to detect the blocks with a low error. This clustering property could be the object of a future theoretical analysis of the efficiency of our BCS algorithm. Fig. 9. View largeDownload slide Evolution of the block-membership error (a) and the Davies–Bouldin index (b), a measure of the clustering properties of the rows of the right cycle eigenvector matrix $$\hat{U}$$, as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. Fig. 9. View largeDownload slide Evolution of the block-membership error (a) and the Davies–Bouldin index (b), a measure of the clustering properties of the rows of the right cycle eigenvector matrix $$\hat{U}$$, as a function of the relative perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$. 6.2.2 Block-cycle with imbalanced blocks and inhomogeneous degrees The primary quantities that are impacted by the presence of imbalanced blocks and inhomogeneous degrees are the Perron eigenvector and the Drazin generalized inverse involved in the perturbation bounds of theorem 3.2. We analyse the impact of the norm $$\Vert f\Vert_2$$ of the Perron eigenvector $$f$$ (verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$) and the norm of the Drazin generalized inverse $$\Vert (\lambda_lI-P)^{\#}\Vert_2$$ on the output of BCS clustering algorithm when the blocks have imbalanced sizes and the degrees are inhomogeneous. For a block-cycle $$G$$ generated by a stochastic blockmodel and with balanced blocks and homogeneous degrees, the Perron eigenvector is expected to have constant entries and $$\Vert f\Vert_2$$ should achieve the minimal value of $$\frac{1}{\sqrt{n}}$$. However, both the norms of the Perron eigenvector and of the Drazin generalized inverse are expected to grow when the degrees are not homogeneous. To produce inhomogeneous degrees, we generate a block-cycle $$G$$ with imbalanced blocks from a stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with $$k=8$$, $$\Phi$$ as in Equation 6.15 and $$\gamma$$ sampled from a Dirichlet distribution $$dir(\alpha\mathbf{1}_k)$$ where $$\mathbf{1}_k$$ is the $$k$$-dimensional vector of ones. This distribution generates $$k$$-dimensional vectors whose entries sum to $$1$$, and it tends to produce constant vectors for high values of $$\alpha$$ and vectors with imbalanced entries for low values of $$\alpha$$ [38]. Figure 10a and b show the evolution of the norm $$\Vert f\Vert_2$$ of the Perron eigenvector and the maximum norm $$\max_l\Vert (\lambda_lI-P)^{\#}\Vert_2$$ of the Drazin generalized inverse respectively, as a function of the entropy   $$H(\gamma)=-\sum\limits_{l=1}^k\gamma_i\log_{10}(\gamma_i)$$ (6.10) of the sampled block distribution $$\gamma$$. As expected, both the norms of the Perron eigenvector and of the Drazin generalized inverse decrease when the entropy of $$\gamma$$ increases, namely when the distribution of the nodes over the blocks tend to be uniform and the blocks tend to have balanced sizes. When $$H(\gamma)>0.7$$, the norm of the Drazin inverse $$\max_l\Vert (\lambda_lI-P)^{\#}\Vert_2$$ drops significantly. Fig. 10. View largeDownload slide (a) Evolution of the norm $$\Vert f\Vert_2$$ of the Perron eigenvector, i.e. the positive real vector verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$, (b) the maximum norm $$\max_l\Vert (\lambda_lI-P)^{\#}\Vert_2$$ of the Drazin generalized inverse, and (c) the norm $$\Vert u^l-\hat{u}^l\Vert_2$$ of the difference between the cycle eigenvectors of a block-cycle $$G$$ with imbalanced blocks and a perturbed version $$\hat{G}$$ of $$G$$ as a function of the entropy of the distribution $$\gamma$$ over the blocks for a block-cycle generated by a stochastic block-model $$SBM(8,\gamma,\Phi)$$ with a constant probability of connection of $$0.7$$ between consecutive blocks. Fig. 10. View largeDownload slide (a) Evolution of the norm $$\Vert f\Vert_2$$ of the Perron eigenvector, i.e. the positive real vector verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$, (b) the maximum norm $$\max_l\Vert (\lambda_lI-P)^{\#}\Vert_2$$ of the Drazin generalized inverse, and (c) the norm $$\Vert u^l-\hat{u}^l\Vert_2$$ of the difference between the cycle eigenvectors of a block-cycle $$G$$ with imbalanced blocks and a perturbed version $$\hat{G}$$ of $$G$$ as a function of the entropy of the distribution $$\gamma$$ over the blocks for a block-cycle generated by a stochastic block-model $$SBM(8,\gamma,\Phi)$$ with a constant probability of connection of $$0.7$$ between consecutive blocks. In order to assess the effect of imbalanced blocks, inhomogeneous degrees and higher values for $$\Vert f\Vert_2$$ and $$\Vert (\lambda_lI-P)^{\#}\Vert_2$$ on the efficiency of our algorithm, we further measure the perturbation of the right cycle eigenvectors and the block assignment error of our method. We generate a block-cycle from a stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with $$k$$ and $$\Phi$$ as before and   $$\gamma=[0.076,0.17,0.18,0.25,0.069,0.039,0.1]$$ (6.11) such that $$H(\gamma)\simeq 0.8$$ in order to produce imbalanced blocks. The perturbed version $$\hat{G}$$ of the block-cycle $$G$$ is then generated as previously. Figure 10c shows the evolution of perturbation $$\Vert u^l-\hat{u}^l\Vert_2$$ on the right cycle eigenvectors as a function of the perturbation magnitude, which is severe even when $$|E|^{-1}(|\hat{E}|-|E|)=1$$. To provide more insight into the maximum relative perturbation that can be tolerated by our BCS clustering algorithm, we analyse Equation 3.12 which provides a bound on the maximum relative perturbation under which our BCS algorithm is theoretically guaranteed to recover the blocks. In particular, Equation 3.12 states that the maximum relative perturbation $$\rho$$ on the out-degree of the nodes of a block cycle should be such that   $$\underset{i}{\max}\text{ }\left(\frac{\hat{d}_i^{out}-d_i^{out}}{d_i^{out}}\right)=\rho\leq \frac{1}{4kn\sigma}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}.$$ (6.12) Figure 11a shows the evolution of the right member of this inequality as a function of the entropy of the block distribution. Since the curve is increasing with the entropy, it confirms that, when the blocks are balanced, our BCS clustering algorithm is more robust to perturbations. However, the perturbation bound varies between $$10^{-5}$$ and $$10^{-6}$$, which is low compare to the volume of perturbation that a block-cycle may actually suffer without affecting our BCS clustering algorithm. Indeed, Fig. 11b displays the block membership error as a function of the volume of perturbation for the imbalanced block-cycle generated by the stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ described above. The block-membership error stays below $$20\%$$ for $$|E|^{-1}(|\hat{E}|-|E|)<7$$. Hence, when the blocks are imbalanced, in spite of a higher perturbation on the spectrum of the transition matrix due to larger values of the Drazin inverse, the clustering properties of the cycle eigenvectors are verified and our BCS clustering algorithm is also able recover the blocks of vertices. Fig. 11. View largeDownload slide Evolution of the maximum perturbation allowed according to Equation 3.13 as a function of the entropy of the block distribution (a) and evolution the block membership error achieved by BCS clustering algorithm as a function of the perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$ (b). Fig. 11. View largeDownload slide Evolution of the maximum perturbation allowed according to Equation 3.13 as a function of the entropy of the block distribution (a) and evolution the block membership error achieved by BCS clustering algorithm as a function of the perturbation magnitude $$|E|^{-1}(|\hat{E}|-|E|)$$ (b). 6.3 Competing methods for directed graph clustering We briefly describe each directed graph clustering algorithm that are tested in the next two sections and compared with our BCS and BAS clustering algorithms. In each case, the approach is described, along with the algorithms involved and the setting of parameters. Reichardt’s blockmodelling approach [7] (REI) discovers functional classes of a network by defining an image graph describing the allowed connections between blocks of nodes. The proposed quality function measures whether, for each edge of the original graph, an edge is connecting the corresponding blocks of nodes in the image graph, the contribution of each edge $$(i,j)$$ being weighted by $$1-d^{out}_id^{in}_j/M$$ where $$M$$ is the total weight of the edges of the graph. An algorithm which iteratively updates the block assignment and the image graph to obtain an approximate maximum of the quality function is provided. The algorithm halts whenever consecutive updates do not improve the quality function. A Degree corrected stochastic blockmodel (DEG) is proposed in [10]: in addition to a matrix of probabilities of connection between the blocks, a variable describing the expected degree of each node is introduced, which allows variations in the degrees of the nodes within the blocks. An iterative Expectation-Maximization algorithm is provided for the estimation of the blocks of nodes. The clustering algorithm for content-based networks [11] (CB) also alleviates the weaknesses of the traditional stochastic blockmodel by defining both node-to-block and block-to-node probabilities of connections. The parameters of the model including the block assignment of the nodes are learnt by an iterative Expectation Maximization algorithm. The SVD-based algorithm presented in [8] (SVD) relies on the singular value decomposition $$W=U\Sigma V^T$$ of the adjacency matrix. Given a target dimension $$d$$, the top $$d$$ left and right singular vectors $$\tilde{U}$$ and $$\tilde{V}$$ are scaled with the corresponding singular values and concatenated to form a matrix $$\tilde{Z}=[\tilde{U}\tilde{\Sigma}^{\frac{1}{2}}|\tilde{V}\tilde{\Sigma}^{\frac{1}{2}}]$$. The rows of $$\tilde{Z}$$ define $$2d$$-dimensional node embeddings which are then clustered using the k-means clustering algorithm to recover blocks of nodes. This method provides a consistent estimation of the blocks of vertices in a stochastic blockmodel [8]. The QR algorithm is used for the computation of the singular values and the singular vectors [30]. The number $$k$$ of clusters is provided as a parameter of the method. More details about the choice of the target dimension $$d$$ will be given further. Finally, our methods are also compared to two types of symmetrizations applied to graph-related matrices and coupled with the traditional spectral clustering algorithm for undirected graphs. First, the bibliometric symmetrization [9] (BIB) builds the symmetric matrix $$W_u=(1-\alpha)WW^T+\alpha W^TW$$. This matrix defines the ajdacency matrix of an undirected graph. Moreover, to account for the presence of hubs and authorities in the original directed graph which tend to be connected to an excessive amount of nodes in the undirected graph, the matrix is further scaled in terms of in- and out-degrees, namely $$W_u=(1-\alpha)D_{out}^{-1/2}WD_{in}^{-1/2}W^TD_{out}^{-1/2}+\alpha D_{in}^{-1/2}W^TD_{out}^{-1/2}WD_{in}^{-1/2}$$ as suggested in [9]. To yield an equal weight for the co-coupling and the co-citation matrices, we set the value of parameter $$\alpha$$ to $$0.5$$. The spectral clustering algorithm with a symmetric Laplacian matrix $$I-D^{-1/2}WD^{-1/2}$$ (with $$D$$ denoting the degree matrix) is then applied to the undirected graph defined by matrix $$W_u$$ [13], providing the number $$k$$ of clusters as a parameter. Second, the clustering algorithm based on a two-step random walk (RW) [6] starts by defining a two-step random walk in which a random walker alternatively moves forward, following the directionality of the edges, and backwards, in the opposite direction. The resulting transition matrix $$P$$ is scaled and symmetrized to form a symmetric Laplacian matrix $$L = I-\frac{1}{2}(\Pi^{\frac{1}{2}}P\Pi^{-\frac{1}{2}}+\Pi^{-\frac{1}{2}}P^T\Pi^{\frac{1}{2}})$$ where $$\Pi$$ is the matrix whose diagonal contains the stationary probabilities of the two-step random walk defined by $$P$$. When applied to this Laplacian matrix $$L$$, the classical spectral clustering algorithm [13] is theoretically able to identify clusters of nodes with a high volume of co-citations or co-references (i.e. that share similar in- or out-neighbours). The k-means algorithm is used as a part of the spectral clustering algorithm and the number $$k$$ of clusters is provided as a parameter. For the computation of eigenvalues in the spectral clustering algorithm coupled both with BIB and RW symmetrizations, the QR algorithm is used [30] and the eigenvectors are normalized to unit Euclidean norm. The algorithms were implemented by us, following the instructions provided by the authors of each method. We only consider unweighted graphs but our observations are valid with positive edge weights. In addition to the methods described above, our algorithms are also compared to slight variations of SVD, BIB and RW methods. Indeed, SVD, RW and BIB methods involve the application of k-means algorithm to cluster spectral embeddings of the nodes. Since k-means algorithm performs poorly on noisy and highly perturbed data samples, we also test versions SVD-DBSCAN, RW-DBSCAN and BIB-DBSCAN of SVD, RW and BIB methods, in which the k-means algorithm is replaced by the DBSCAN clustering algorithm [39] which is less sensitive to the presence of noise and is more flexible in the shape of the clusters it detects. The parameter defining the number of points in a sample’s neighbourhood is set to $$3$$. The parameter $$\epsilon$$ (minimum distance for two samples to be considered neighbours) is adjusted by iteratively applying DBSCAN until the right number $$k$$ of clusters is found. 6.4 Experiments for BCS clustering algorithm For our first experiment, we generate an unperturbed block-cycle $$G$$ with block membership function $$\tau$$ such that $$[G,\tau]\sim SBM(k,\gamma,\Phi)$$ with $$k=8$$ and   $$\begin{array}{rcl} \gamma &=& [0.18,0.2,0.05,0.2,0.14,0.04,0.07,0.13]\\ \Phi_{st}&=&\left\lbrace\begin{array}{ll} 0.7 & \text{ if }(s,t)\in\{(1,2),(2,3),\ldots,(7,8),(8,1)\}\\ 0 & \text{ otherwise } \end{array}\right. \end{array}\!\!\!\!\!\!.$$ (6.13) where entries of $$\gamma$$ were chosen randomly for obtaining imbalanced blocks of nodes. We generate a perturbing graph $$\tilde{G}$$ with block membership function $$\tilde{\tau}$$ such that $$[\tilde{G},\tilde{\tau}]\sim SBM(k,\gamma,\tilde{\Phi})$$ with parameter $$\tilde{\Phi}$$ of the form $$\tilde{\Phi}=\epsilon Q$$, where $$\epsilon\in [0,1]$$ controls the magnitude of the perturbation and $$Q\in [0,1]^{k\times k}$$ is generated with random entries in $$[0,1]$$. We compare the partitioning computed by BCS clustering algorithm to the one returned by all other algorithms that were mentioned. Regarding SVD clustering algorithm, the target dimension $$d$$ is chosen as the rank of matrix $$\Phi$$ ($$k$$ in this case) in accordance with the original form of SVD clustering algorithm [8]. Figure 12 shows the evolution of the block membership error and the NMI-based error as a function of the perturbation magnitude $$\epsilon$$ for $$n=1000$$ nodes. As $$\epsilon=1$$ would completely hide the block-cyclic structure of the graph, we restrict ourselves to $$\epsilon\in [0,0.9]$$. Firstly, we observe that BCS clustering algorithm produces an error that is lower than errors obtained with other algorithms, both in terms of block membership error and NMI. Secondly, we see that even in the presence of a $$80\%$$ perturbation, the block membership error achieved by BCS clustering algorithm is below $$30\%$$ while all other algorithms produce an error above $$40\%$$. Our BCS clustering algorithm is thus more robust in the presence of perturbation. Additionally, since the noise itself is derived from a stochastic blockmodel, we also display the performance of the algorithms in the detection of the noisy block membership defined by $$\tilde{\tau}$$ in order to check to what extent the methods are impacted by the noise. Figure 13 shows the evolution with $$\epsilon$$ of the block membership error and the NMI-based error achieved in the detection of the noisy block membership function $$\tilde{\tau}$$. As can be seen from the graphs, our BCS clustering algorithm achieves the largest error regardless of $$\epsilon$$. All the error measures decrease with $$\epsilon$$ since each method tends to detect the noisy blocks when $$\epsilon$$ exceeds $$0.5$$, except our BCS clustering algorithm which always achieves a block membership error above $$0.6$$. This confirms the ability of our BCS clustering algorithm to focus on the block-cyclic structure while neglecting any other pattern. Fig. 12. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of $$\tau$$ of our BCS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. Fig. 12. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of $$\tau$$ of our BCS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. Fig. 13. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ of our BCS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. Fig. 13. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ of our BCS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. In Fig. 14a, we also display the evolution of the NMI-based error13 as a function of the perturbation for SVD-DBSCAN, RW-DBSCAN and BIB-DBSCAN algorithms compared to that of our BCS algorithm. While the use of DBSCAN algorithm does not induce any significant change in the outputs of BIB and SVD clustering algorithms, it does improve the performance of RW algorithm. However, despite the robustness of DBSCAN algorithm to variations in the shapes of the clusters, our BCS clustering algorithm still outperforms these three methods since the error produced by SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN is above $$0.4$$ for a perturbation magnitude of $$80\%$$. Figure 14b provides an explanation for this phenomenon by displaying the NMI-based error in the detection of the noisy block membership function $$\tilde{\tau}$$: it is seen that SVD-DBSCAN, RW-DBSCAN and BIB-DBSCAN algorithms tend to detect the blocks of the perturbation graph $$\tilde{G}$$ when the magnitude of the perturbation grows, while our BCS clustering algorithm always produces an NMI-based error above $$0.6$$ in the detection of the noisy blocks. Fig. 14. View largeDownload slide Comparison of the NMI-based errors in the approximation of $$\tau$$ (a) and in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ (b) for our BCS algorithm and SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. Fig. 14. View largeDownload slide Comparison of the NMI-based errors in the approximation of $$\tau$$ (a) and in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ (b) for our BCS algorithm and SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-cyclic graph of $$n=1000$$ nodes. For our second experiment, we generate two graphs following the same stochastic blockmodel described above $$SBM(k,\gamma,\Phi)$$, each containing $$500$$ nodes. Then we combine them using Equation 6.1 with parameter $$\alpha=0.1$$ and a perturbation magnitude of $$\epsilon = 0.2$$. Table 1 shows the result obtained. As in previous experiment, parameter $$d$$ of SVD clustering algorithm is set to $$k$$ (rank of matrix $$\Phi$$). As expected, our method perfectly recovers the block membership of nodes while the other methods all produce block membership errors above $$30\%$$. We may also observe that the SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN outperform SVD, BIB and RW (which are using the standard k-means algorithm) by $$5$$–$$10\%$$ of block membership error, which suggests that the use of DBSCAN algorithm reduces the sensitivity to perturbations. Table 1 Experiment based on two block-cycles of $$500$$ nodes generated by stochastic blockmodel $$SBM(k\gamma,\Phi)$$ and combined to form a block-cycle of $$1000$$ nodes based on Equation 6.1 with $$\alpha=0.1$$ Method  Block membership error  1-NMI  BCS  $$\mathbf{0}$$  $$\mathbf{0}$$  SVD  $$0.5$$  $$0.47$$  BIB  $$0.49$$  $$0.47$$  REI  $$0.34$$  $$0.25$$  RW  $$0.49$$  $$0.46$$  CB  $$0.40$$  $$0.49$$  DEG  $$0.61$$  $$0.58$$  SVD-DBSCAN  $$0.46$$  $$0.31$$  BIB-DBSCAN  $$0.43$$  $$0.2$$  RW-DBSCAN  $$0.39$$  $$0.21$$  Method  Block membership error  1-NMI  BCS  $$\mathbf{0}$$  $$\mathbf{0}$$  SVD  $$0.5$$  $$0.47$$  BIB  $$0.49$$  $$0.47$$  REI  $$0.34$$  $$0.25$$  RW  $$0.49$$  $$0.46$$  CB  $$0.40$$  $$0.49$$  DEG  $$0.61$$  $$0.58$$  SVD-DBSCAN  $$0.46$$  $$0.31$$  BIB-DBSCAN  $$0.43$$  $$0.2$$  RW-DBSCAN  $$0.39$$  $$0.21$$  Significance of bold represents the lowest errors achieved among all algorithms. Table 1 Experiment based on two block-cycles of $$500$$ nodes generated by stochastic blockmodel $$SBM(k\gamma,\Phi)$$ and combined to form a block-cycle of $$1000$$ nodes based on Equation 6.1 with $$\alpha=0.1$$ Method  Block membership error  1-NMI  BCS  $$\mathbf{0}$$  $$\mathbf{0}$$  SVD  $$0.5$$  $$0.47$$  BIB  $$0.49$$  $$0.47$$  REI  $$0.34$$  $$0.25$$  RW  $$0.49$$  $$0.46$$  CB  $$0.40$$  $$0.49$$  DEG  $$0.61$$  $$0.58$$  SVD-DBSCAN  $$0.46$$  $$0.31$$  BIB-DBSCAN  $$0.43$$  $$0.2$$  RW-DBSCAN  $$0.39$$  $$0.21$$  Method  Block membership error  1-NMI  BCS  $$\mathbf{0}$$  $$\mathbf{0}$$  SVD  $$0.5$$  $$0.47$$  BIB  $$0.49$$  $$0.47$$  REI  $$0.34$$  $$0.25$$  RW  $$0.49$$  $$0.46$$  CB  $$0.40$$  $$0.49$$  DEG  $$0.61$$  $$0.58$$  SVD-DBSCAN  $$0.46$$  $$0.31$$  BIB-DBSCAN  $$0.43$$  $$0.2$$  RW-DBSCAN  $$0.39$$  $$0.21$$  Significance of bold represents the lowest errors achieved among all algorithms. 6.5 Experiment for BAS clustering algorithm Using the same framework as for block-cycles, in the first experiment, we generate an unperturbed block-acyclic graph $$G$$ with block membership function $$\tau$$ such that $$[G,\tau]\sim SBM(k,\gamma,\Phi)$$ with $$k=8$$  $$\begin{array}{rcl} \gamma &=& [0.18,0.2,0.05,0.2,0.14,0.04,0.07,0.13]\\ \Phi_{st} &=& \left\lbrace\begin{array}{ll} 0.5 & \text{ if }s<t\\ 0 & \text{ otherwise } \end{array}\right. \end{array}\!\!\!.$$ (6.14) We generate a perturbing graph $$\tilde{G}$$ with block membership function $$\tilde{\tau}$$ such that $$[\tilde{G},\tilde{\tau}]\sim SBM(k,\gamma,\tilde{\Phi})$$ with parameter $$\tilde{\Phi}$$ of the form $$\tilde{\Phi}=\epsilon Q$$ where $$\epsilon\in [0,1]$$ and $$Q\in [0,1]^{k\times k}$$ with random entries in $$[0,1]$$. Figure 15 displays the result for $$n=1000$$ nodes (target dimension $$d$$ for SVD clustering algorithm is chosen as the rank of matrix $$\Phi$$ which is $$k-1$$). We observe that BAS clustering algorithm achieves the smallest block membership error and NMI-based error (or close to the smallest for perturbation of magnitude $$0.3$$). We also display in Fig. 16 the evolution of the block membership error and the NMI-based error as a function $$\epsilon$$ for the detection of the noisy block memberships expressed by function $$\tilde{\tau}$$. The error produced by our BAS clustering algorithm is significantly higher than that of other algorithms. Apart from our BAS clustering algorithm which achieves a block membership error above $$0.6$$ regardless of $$\epsilon$$, the other methods tend to detect the noisy blocks when $$\epsilon$$ increases. As in the block-cyclic case, this confirms the ability of our BAS clustering algorithm to focus on the block-acyclic structure of the graph regardless of other patterns. Fig. 15. View largeDownload slide Evolution of the block membership errors (a) and errors based on NMI (b) for our BAS algorithm and for SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. Fig. 15. View largeDownload slide Evolution of the block membership errors (a) and errors based on NMI (b) for our BAS algorithm and for SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. Fig. 16. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ of our BAS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. Fig. 16. View largeDownload slide Comparison of the block membership errors (a) and errors based on NMI (b) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ of our BAS algorithm and SVD, BIB, RW, REI, CB and DEG clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. In Fig. 17a, we display the evolution of the NMI-based error for our BAS clustering algorithm and SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN algorithms. For a low magnitude of the perturbation (with $$\epsilon\leq 0.3$$), the DBSCAN-based versions of the three algorithms perform slightly better than the original ones. Actually, we can see from Fig. 17b that SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN perform slightly better than the original SVD, BIB and RW methods for the detection of the noisy block-membership function $$\tilde{\tau}$$. It can be concluded that DBSCAN clustering algorithm favours the detection of noisy blocks when the noise is structured. Hence, our BAS clustering algorithm outperforms the three other method in approximating $$\tau$$, while it produces an NMI error always above $$0.6$$ in the detection of noisy block memberships (Fig. 17b). Fig. 17. View largeDownload slide Comparison of the NMI-based error in the approximation of $$\tau$$ (a) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ (b) for our BAS algorithm and SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. Fig. 17. View largeDownload slide Comparison of the NMI-based error in the approximation of $$\tau$$ (a) in the approximation of the noisy blocks expressed by $$\tilde{\tau}$$ (b) for our BAS algorithm and SVD-DBSCAN, BIB-DBSCAN and RW-DBSCAN clustering algorithms as a function of the magnitude of the perturbation (parameter $$\epsilon$$) on a block-acyclic graph of $$n=1000$$ nodes. For our second experiment, we generate two graphs following the same stochastic blockmodel described above $$SBM(k,\gamma,\Phi)$$, each containing $$500$$ nodes. Then we combine them using Equation 6.1 with parameter $$\alpha=0.1$$ and a perturbation magnitude of $$\epsilon=0.01$$. Table 2 shows the result obtained (parameter $$d$$ of SVD clustering algorithm is again set to the rank of $$\Phi$$, $$k-1$$ in this case). Our method recovers the block membership of nodes with a low error while the other methods all produce block membership errors above $$40\%$$ as shown in Table 2. Table 2 Experiment based on two block-acyclic graphs of $$500$$ nodes generated by stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ and combined to form a block-acyclic graph of $$1000$$ nodes based on Equation 6.1 with $$\alpha=0.1$$ Method  Block membership error  1-NMI  BAS  $$\mathbf{0.05}$$  $$\mathbf{0.11}$$  SVD  $$0.50$$  $$0.46$$  BIB  $$0.52$$  $$0.49$$  REI  $$0.53$$  $$0.48$$  RW  $$0.40$$  $$0.38$$  CB  $$0.56$$  $$0.58$$  DEG  $$0.52$$  $$0.46$$  SVD-DBSCAN  $$0.42$$  $$0.38$$  BIB-DBSCAN  $$0.53$$  $$0.40$$  RW-DBSCAN  $$0.40$$  $$0.42$$  Method  Block membership error  1-NMI  BAS  $$\mathbf{0.05}$$  $$\mathbf{0.11}$$  SVD  $$0.50$$  $$0.46$$  BIB  $$0.52$$  $$0.49$$  REI  $$0.53$$  $$0.48$$  RW  $$0.40$$  $$0.38$$  CB  $$0.56$$  $$0.58$$  DEG  $$0.52$$  $$0.46$$  SVD-DBSCAN  $$0.42$$  $$0.38$$  BIB-DBSCAN  $$0.53$$  $$0.40$$  RW-DBSCAN  $$0.40$$  $$0.42$$  Significance of bold represents the lowest errors achieved among all algorithms. Table 2 Experiment based on two block-acyclic graphs of $$500$$ nodes generated by stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ and combined to form a block-acyclic graph of $$1000$$ nodes based on Equation 6.1 with $$\alpha=0.1$$ Method  Block membership error  1-NMI  BAS  $$\mathbf{0.05}$$  $$\mathbf{0.11}$$  SVD  $$0.50$$  $$0.46$$  BIB  $$0.52$$  $$0.49$$  REI  $$0.53$$  $$0.48$$  RW  $$0.40$$  $$0.38$$  CB  $$0.56$$  $$0.58$$  DEG  $$0.52$$  $$0.46$$  SVD-DBSCAN  $$0.42$$  $$0.38$$  BIB-DBSCAN  $$0.53$$  $$0.40$$  RW-DBSCAN  $$0.40$$  $$0.42$$  Method  Block membership error  1-NMI  BAS  $$\mathbf{0.05}$$  $$\mathbf{0.11}$$  SVD  $$0.50$$  $$0.46$$  BIB  $$0.52$$  $$0.49$$  REI  $$0.53$$  $$0.48$$  RW  $$0.40$$  $$0.38$$  CB  $$0.56$$  $$0.58$$  DEG  $$0.52$$  $$0.46$$  SVD-DBSCAN  $$0.42$$  $$0.38$$  BIB-DBSCAN  $$0.53$$  $$0.40$$  RW-DBSCAN  $$0.40$$  $$0.42$$  Significance of bold represents the lowest errors achieved among all algorithms. 6.6 Finding the number $$k$$ of blocks In this experiment, Algorithm 3.4 for finding the number of blocks in a block-cyclic or a block-acyclic graph is tested. A perturbed block-cyclic graph is generated based on a stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with $$\gamma=[1/8,\ldots,1/8]$$ and   $$\Phi_{st}=\left\lbrace\begin{array}{ll} 0.7 & \text{ if }(s,t)\in\{(1,2),(2,3),\ldots,(k-1,k),(k,1)\}\\ 0 & \text{ otherwise. } \end{array}\right.$$ (6.15) and $$\gamma=[1/k,\ldots,1/k]$$ to generate balanced blocks and $$\gamma\sim dir(2*\mathbf{1}_K)$$ to generate imbalanced blocks (as was done in Section 6.2). A perturbing graph is generated from $$SBM(k,\gamma,\tilde{\Phi})$$ where $$\tilde{\Phi}=\epsilon Q$$ with $$Q\in [0,1]^{k\times k}$$ with random entries in $$[0,1]$$ and a perturbation magnitude of $$\epsilon=0.1$$. For a perturbed block-acyclic graph, the same stochastic blockmodels are used but with $$\Phi$$ adapted to form a block-acyclic graph. Figure 18a and b show the predicted number of blocks for a perturbed block-cycle of different number $$k$$ of blocks, respectively for balanced and imbalanced blocks. Both the results of Algorithm 3.4 (using $$\lfloor\frac{k}{2}\rfloor$$ eigenvectors) and the method using the first right cycle eigenvector $$u^1$$ only are displayed. Both for balanced and imbalanced blocks, the method using all $$\lfloor\frac{k}{2}\rfloor$$ eigenvectors of interest performs better: for balanced blocks, the average relative error14 is of $$7\%$$ and, for imbalanced blocks, it is of $$11\%$$. Figure 19a and b display the results of the same experiments for a perturbed block-acyclic graph. As in the block-cyclic case, the method using all $$\lfloor\frac{k}{2}\rfloor$$ eigenvectors of interest performs better than the one using the first right cycle eigenvector only. For a perturbed block-acyclic graph, the average relative error is of $$15\%$$ when the blocks are balanced and of $$26\%$$ when the blocks are imbalanced. Fig. 18. View largeDownload slide Evolution of the predicted number of clusters versus the actual number of clusters for a perturbed block-cycle with a perturbation magnitude of $$\epsilon=0.1$$ and with balanced block (a) and imbalanced block (b). Fig. 18. View largeDownload slide Evolution of the predicted number of clusters versus the actual number of clusters for a perturbed block-cycle with a perturbation magnitude of $$\epsilon=0.1$$ and with balanced block (a) and imbalanced block (b). Fig. 19. View largeDownload slide Evolution of the predicted number of clusters versus the actual number of clusters for a perturbed block-acyclic graph with a perturbation magnitude of $$\epsilon=0.1$$ and with balanced block (a) and imbalanced block (b). Fig. 19. View largeDownload slide Evolution of the predicted number of clusters versus the actual number of clusters for a perturbed block-acyclic graph with a perturbation magnitude of $$\epsilon=0.1$$ and with balanced block (a) and imbalanced block (b). 7. Application to real-world networks Pure block-cycles are rarely encountered in real-world networks. But block-acyclic networks (or graphs that are close to being block-acyclic) are common. In this section, we apply BAS clustering algorithm to two real-world networks: a trophic network and a network of Autonomous Systems. Once the block partition $$\tau: V\rightarrow \{1,\ldots,k\}$$ of nodes in graph $$G=(V,E,W)$$ is obtained, it is of interest to find the ranking of blocks which can be interpreted as a topological order of blocks. We define a graph $$G^B=(V^B,E^B)$$ where $$V^B=\{1,\ldots,k\}$$ and   $$(m,n)\in E^B \Leftrightarrow \underset{\substack{u:\tau(u)=m\\v:\tau(v)=n}}{\sum}W_{uv}-W_{vu}>0.$$ (7.1) If graph $$G$$ is indeed close to being block-acyclic, then graph $$G^B$$ is acyclic. The ranking of blocks is then obtained by computing a topological order of nodes in $$G^B$$ [40] which labels nodes in $$G^B$$ from $$1$$ to $$k$$ such that   $$(i,j)\in E^B \Rightarrow i<j.$$ (7.2) Hence we relabel each block with its rank as given by the topological order in $$G^B$$. This ranking of blocks can be further used to improve the quality of the block partitioning returned by BAS clustering algorithm through a simple postprocessing step. Indeed, in real-world graphs, BAS clustering algorithm might confuse consecutive blocks in the hierarchy (for instance assigning a node to block $$b-1$$ or $$b+1$$ instead of its true block $$b$$) as the presence of perturbation causes the separation between corresponding clusters in the embedding space $$\mathbb{R}^k$$ to become fuzzy. Hence, we define a quality criterion $$C_A$$ measuring how close to block-acyclic the graph is, based on node partitioning $$\tau$$  $$C_A=\frac{\vert \{(u,v)\in E:\tau(u)<\tau(v)\}\vert}{\vert E\vert}.$$ (7.3) The post-processing step considers each node $$i$$ and checks whether $$C_A$$ is increased by assigning $$i$$ to block $$\tau(i)+1$$ or $$\tau(i)-1$$ and changes the block membership of $$i$$ if this is the case. This process is repeated once for each node in a random order. This post-processing step causes a negligible increase in the time of computation of BAS clustering algorithm. But empirical analysis show that it improves the quality of the result for instance in the case of the trophic network presented below. 7.1 Finding clusters in trophic networks The basic idea of trophic networks is to represent all transfers of carbon in an ecosystem as a single directed graph [1], in which nodes are living beings or any other agent that stores carbon such as animals, carbon dioxide in the air, etc. A directed edge $$(A,B)$$ represents a steady transfer of carbon from $$A$$ to $$B$$, for instance a predator–prey relationship where $$B$$ is the predator and $$A$$ is the prey. Considering the trophic network of an isolated ecosystem, we should be able to partition nodes into groups such that nodes belonging to the same group have roughly the same groups of predators and groups of preys. Empirical observations lead to the following five categories [41]: primary producers (plants and algae that produce organic material by photosynthesis), primary consumers (herbivores), secondary consumers (carnivores that hunt herbivores, such as badgers), tertiary consumers (carnivores hunting other carnivores, such as hawks) and top-level predators having no other predators (such as bears). Decomposers are sometimes associated to primary producers to form level 1 as neither of them hunts for its livelihood. Other types of ‘carbon holders’ may be included in the network such as the atmosphere that receives carbon dioxide from all consumers and provides the producers with carbon. A popular way to estimate the role of a species in a food web is the computation of its trophic level. The trophic level can be viewed as the level at which a species can be found in a food chain. The trophic level $$T_i$$ of species $$i$$ is defined as $$T_i=1+\sum_{j}T_jp_{ji}$$ where $$p_{ji}$$ denotes the fraction of $$j$$ in the diet of $$i$$ [42]. Hence, if we define $$P$$ as the transition matrix of a food network with associated adjacency matrix $$W$$ then the vector $$T$$ of trophic levels is $$T=(I-P^T)^+\mathbf{1}$$ where $$\mathbf{1}$$ is the column vector of all ones and $$(I-P^T)^+$$ denotes the Moore-Penrose pseudoinverse of $$(I-P^T)$$ [43]. It is clear that the adjacency matrix of a food web has a block upper triangular shape and intuition suggests that the corresponding graph is block-acyclic. Our goal is to use BAS clustering algorithm to partition the agents of a food web into five groups corresponding to the five categories described previously. Then, we check if the partitioning of nodes that we obtain is consistent with the trophic levels. We apply BAS clustering algorithm to a trophic network of Florida Bay [44] which consists of $$128$$ species or other relevant agents in the trophic network. Having information about the feeding relationships in the food web, we build a directed graph $$G=(V,E)$$ where a directed unweighted edge connects node $$A$$ to node $$B$$ if $$B$$ feeds on $$A$$. Figure 20 shows the eigenvalues of the transition matrix and the cycle eigenvalues used by BAS clustering algorithm. We observe that all but a few eigenvalues are close to zero. As we partition nodes into five blocks, we select the five eigenvalues with largest norm. Fig. 20. View largeDownload slide Eigenvalues of the transition matrix of the trophic network. The five eigenvalues with largest modulus are the cycle eigenvalues. Fig. 20. View largeDownload slide Eigenvalues of the transition matrix of the trophic network. The five eigenvalues with largest modulus are the cycle eigenvalues. Table 3 shows the five blocks extracted by BAS clustering algorithm. The table also includes the trophic levels of each agent and the average trophic level of each cluster. We observe that the block partitioning we obtain is consistent with the empirical classification into five categories described above: block 1: mostly autotrophs and bacteria, block 2: herbivorous and small omnivorous (such as gastropods and shrimps), block 3: larger omnivorous and carnivorous (Killifish, crabs, lobsters...), block 4: carnivorous (kingfishers, catfish, snappers...), block 5: top-level predators (sharks, cormorants...). Table 3. Blocks returned by BAS clustering algorithm along with the trophic levels of each agent Block $$1$$ is at the bottom of the hierarchy, $$5$$ is at the top. Block 1  Block 2  Block 3  Block 4  Block 5  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Average  1.65  Average  3.13  Average  4.03  Average  4.69  Average  5.17  Input  0  Roots  1  Coral  3.43  Rays  4.89  Sharks  5.12  2um Spherical Phytoplankt  1  Water Cilitaes  2.9  Other Cnidaridae  4.26  Bonefish  4.81  Tarpon  5.23  Synedococcus  1  Acartia Tonsa  2.91  Echinoderma  3.7  Lizardfish  4.98  Grouper  5.06  Oscillatoria  1  Oithona nana  2.91  Lobster  4.58  Catfish  4.93  Mackerel  5.2  Small Diatoms ($$<20$$um)  1  Paracalanus  2.91  Predatory Crabs  4.49  Eels  4.93  Barracuda  5.21  Big Diatoms ($$>20$$um)  1  Other Copepoda  3.36  Callinectus sapidus  4.49  Brotalus  4.85  Loon  5.21  Dinoflagellates  1  Meroplankton  3.63  Stone Crab  4.31  Needlefish  4.72  Greeb  5.05  Other Phytoplankton  1  Other Zooplankton  2.91  Sardines  4.18  Snook  4.64  Pelican  5.2  Benthic Phytoplankton  1  Sponges  3  Anchovy  4.25  Jacks  4.71  Comorant  5.18  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Pompano  4.83  Big Herons and Egrets  5.14  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Other Snapper  4.69  Predatory Ducks  5.14  Halodule  1  Detritivorous Gastropods  4.13  Toadfish  4.85  Gray Snapper  4.78  Raptors  5.66  Syringodium  1  Epiphytic Gastropods  2  Halfbeaks  3.26  Grunt  4.36  Crocodiles  5.39  Drift Algae  1  Predatory Gastropods  5.12  Other Killifish  3.7  Scianids  4.63  Dolphin  5.27  Epiphytes  1  Detritivorous Polychaetes  3.88  Goldspotted killifish  4.3  Spotted Seatrout  4.69  Water POC  5.04  Free Bacteria  2.92  Predatory Polychaetes  4.44  Rainwater killifish  3.82  Red Drum  4.77  Benthic POC  4.55  Water Flagellates  3.19  Suspension Feeding Polych  3.32  Silverside  3.95  Spadefish  4.58  Output  5.22  Benthic Flagellates  3.77  Macrobenthos  4.13  Other Horsefish  3.97  Parrotfish  3.86  Respiration  5.19  Benthic Ciliates  4.1  Benthic Crustaceans  3.88  Gulf Pipefish  4.06  Flatfish  4.58        Meiofauna  4.35  Detritivorous Amphipods  3.88  Dwarf Seahorse  3.97  Filefishes  4.78              Herbivorous Amphipods  2.59  Mojarra  4.32  Puffer  4.77              Isopods  2  Porgy  4.26  Other Pelagic Fishes  4.74              Herbivorous Shrimp  2  Pinfish  3.82  Small Herons and Egrets  4.85              Predatory Shrimp  3.9  Mullet  3.86  Ibis  4.8              Pink Shrimp  3.43  Blennies  4.12  Roseate Spoonbill  4.91              Thor Floridanus  2  Code Goby  4.41  Herbivorous Ducks  4.19              Detritivorous Crabs  3.72  Clown Goby  4.41  Omnivorous Ducks  4.28              Omnivorous Crabs  3.79  Other Demersal Fishes  4.21  Gruiformes  4.91              Sailfin Molly  2  DOC  1.92  Small Shorebirds  4.93              Green Turtle  2        Gulls and Terns  4.91                          Kingfisher  4.86                          Loggerhead Turtle  4.79                          Hawksbill Turtle  4.71                          Manatee  3.9        Block 1  Block 2  Block 3  Block 4  Block 5  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Average  1.65  Average  3.13  Average  4.03  Average  4.69  Average  5.17  Input  0  Roots  1  Coral  3.43  Rays  4.89  Sharks  5.12  2um Spherical Phytoplankt  1  Water Cilitaes  2.9  Other Cnidaridae  4.26  Bonefish  4.81  Tarpon  5.23  Synedococcus  1  Acartia Tonsa  2.91  Echinoderma  3.7  Lizardfish  4.98  Grouper  5.06  Oscillatoria  1  Oithona nana  2.91  Lobster  4.58  Catfish  4.93  Mackerel  5.2  Small Diatoms ($$<20$$um)  1  Paracalanus  2.91  Predatory Crabs  4.49  Eels  4.93  Barracuda  5.21  Big Diatoms ($$>20$$um)  1  Other Copepoda  3.36  Callinectus sapidus  4.49  Brotalus  4.85  Loon  5.21  Dinoflagellates  1  Meroplankton  3.63  Stone Crab  4.31  Needlefish  4.72  Greeb  5.05  Other Phytoplankton  1  Other Zooplankton  2.91  Sardines  4.18  Snook  4.64  Pelican  5.2  Benthic Phytoplankton  1  Sponges  3  Anchovy  4.25  Jacks  4.71  Comorant  5.18  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Pompano  4.83  Big Herons and Egrets  5.14  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Other Snapper  4.69  Predatory Ducks  5.14  Halodule  1  Detritivorous Gastropods  4.13  Toadfish  4.85  Gray Snapper  4.78  Raptors  5.66  Syringodium  1  Epiphytic Gastropods  2  Halfbeaks  3.26  Grunt  4.36  Crocodiles  5.39  Drift Algae  1  Predatory Gastropods  5.12  Other Killifish  3.7  Scianids  4.63  Dolphin  5.27  Epiphytes  1  Detritivorous Polychaetes  3.88  Goldspotted killifish  4.3  Spotted Seatrout  4.69  Water POC  5.04  Free Bacteria  2.92  Predatory Polychaetes  4.44  Rainwater killifish  3.82  Red Drum  4.77  Benthic POC  4.55  Water Flagellates  3.19  Suspension Feeding Polych  3.32  Silverside  3.95  Spadefish  4.58  Output  5.22  Benthic Flagellates  3.77  Macrobenthos  4.13  Other Horsefish  3.97  Parrotfish  3.86  Respiration  5.19  Benthic Ciliates  4.1  Benthic Crustaceans  3.88  Gulf Pipefish  4.06  Flatfish  4.58        Meiofauna  4.35  Detritivorous Amphipods  3.88  Dwarf Seahorse  3.97  Filefishes  4.78              Herbivorous Amphipods  2.59  Mojarra  4.32  Puffer  4.77              Isopods  2  Porgy  4.26  Other Pelagic Fishes  4.74              Herbivorous Shrimp  2  Pinfish  3.82  Small Herons and Egrets  4.85              Predatory Shrimp  3.9  Mullet  3.86  Ibis  4.8              Pink Shrimp  3.43  Blennies  4.12  Roseate Spoonbill  4.91              Thor Floridanus  2  Code Goby  4.41  Herbivorous Ducks  4.19              Detritivorous Crabs  3.72  Clown Goby  4.41  Omnivorous Ducks  4.28              Omnivorous Crabs  3.79  Other Demersal Fishes  4.21  Gruiformes  4.91              Sailfin Molly  2  DOC  1.92  Small Shorebirds  4.93              Green Turtle  2        Gulls and Terns  4.91                          Kingfisher  4.86                          Loggerhead Turtle  4.79                          Hawksbill Turtle  4.71                          Manatee  3.9        Significance of bold values are average values for each column. Table 3. Blocks returned by BAS clustering algorithm along with the trophic levels of each agent Block $$1$$ is at the bottom of the hierarchy, $$5$$ is at the top. Block 1  Block 2  Block 3  Block 4  Block 5  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Average  1.65  Average  3.13  Average  4.03  Average  4.69  Average  5.17  Input  0  Roots  1  Coral  3.43  Rays  4.89  Sharks  5.12  2um Spherical Phytoplankt  1  Water Cilitaes  2.9  Other Cnidaridae  4.26  Bonefish  4.81  Tarpon  5.23  Synedococcus  1  Acartia Tonsa  2.91  Echinoderma  3.7  Lizardfish  4.98  Grouper  5.06  Oscillatoria  1  Oithona nana  2.91  Lobster  4.58  Catfish  4.93  Mackerel  5.2  Small Diatoms ($$<20$$um)  1  Paracalanus  2.91  Predatory Crabs  4.49  Eels  4.93  Barracuda  5.21  Big Diatoms ($$>20$$um)  1  Other Copepoda  3.36  Callinectus sapidus  4.49  Brotalus  4.85  Loon  5.21  Dinoflagellates  1  Meroplankton  3.63  Stone Crab  4.31  Needlefish  4.72  Greeb  5.05  Other Phytoplankton  1  Other Zooplankton  2.91  Sardines  4.18  Snook  4.64  Pelican  5.2  Benthic Phytoplankton  1  Sponges  3  Anchovy  4.25  Jacks  4.71  Comorant  5.18  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Pompano  4.83  Big Herons and Egrets  5.14  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Other Snapper  4.69  Predatory Ducks  5.14  Halodule  1  Detritivorous Gastropods  4.13  Toadfish  4.85  Gray Snapper  4.78  Raptors  5.66  Syringodium  1  Epiphytic Gastropods  2  Halfbeaks  3.26  Grunt  4.36  Crocodiles  5.39  Drift Algae  1  Predatory Gastropods  5.12  Other Killifish  3.7  Scianids  4.63  Dolphin  5.27  Epiphytes  1  Detritivorous Polychaetes  3.88  Goldspotted killifish  4.3  Spotted Seatrout  4.69  Water POC  5.04  Free Bacteria  2.92  Predatory Polychaetes  4.44  Rainwater killifish  3.82  Red Drum  4.77  Benthic POC  4.55  Water Flagellates  3.19  Suspension Feeding Polych  3.32  Silverside  3.95  Spadefish  4.58  Output  5.22  Benthic Flagellates  3.77  Macrobenthos  4.13  Other Horsefish  3.97  Parrotfish  3.86  Respiration  5.19  Benthic Ciliates  4.1  Benthic Crustaceans  3.88  Gulf Pipefish  4.06  Flatfish  4.58        Meiofauna  4.35  Detritivorous Amphipods  3.88  Dwarf Seahorse  3.97  Filefishes  4.78              Herbivorous Amphipods  2.59  Mojarra  4.32  Puffer  4.77              Isopods  2  Porgy  4.26  Other Pelagic Fishes  4.74              Herbivorous Shrimp  2  Pinfish  3.82  Small Herons and Egrets  4.85              Predatory Shrimp  3.9  Mullet  3.86  Ibis  4.8              Pink Shrimp  3.43  Blennies  4.12  Roseate Spoonbill  4.91              Thor Floridanus  2  Code Goby  4.41  Herbivorous Ducks  4.19              Detritivorous Crabs  3.72  Clown Goby  4.41  Omnivorous Ducks  4.28              Omnivorous Crabs  3.79  Other Demersal Fishes  4.21  Gruiformes  4.91              Sailfin Molly  2  DOC  1.92  Small Shorebirds  4.93              Green Turtle  2        Gulls and Terns  4.91                          Kingfisher  4.86                          Loggerhead Turtle  4.79                          Hawksbill Turtle  4.71                          Manatee  3.9        Block 1  Block 2  Block 3  Block 4  Block 5  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Agents  Levels  Average  1.65  Average  3.13  Average  4.03  Average  4.69  Average  5.17  Input  0  Roots  1  Coral  3.43  Rays  4.89  Sharks  5.12  2um Spherical Phytoplankt  1  Water Cilitaes  2.9  Other Cnidaridae  4.26  Bonefish  4.81  Tarpon  5.23  Synedococcus  1  Acartia Tonsa  2.91  Echinoderma  3.7  Lizardfish  4.98  Grouper  5.06  Oscillatoria  1  Oithona nana  2.91  Lobster  4.58  Catfish  4.93  Mackerel  5.2  Small Diatoms ($$<20$$um)  1  Paracalanus  2.91  Predatory Crabs  4.49  Eels  4.93  Barracuda  5.21  Big Diatoms ($$>20$$um)  1  Other Copepoda  3.36  Callinectus sapidus  4.49  Brotalus  4.85  Loon  5.21  Dinoflagellates  1  Meroplankton  3.63  Stone Crab  4.31  Needlefish  4.72  Greeb  5.05  Other Phytoplankton  1  Other Zooplankton  2.91  Sardines  4.18  Snook  4.64  Pelican  5.2  Benthic Phytoplankton  1  Sponges  3  Anchovy  4.25  Jacks  4.71  Comorant  5.18  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Pompano  4.83  Big Herons and Egrets  5.14  Thalassia  1  Bivalves  3  Bay Anchovy  3.95  Other Snapper  4.69  Predatory Ducks  5.14  Halodule  1  Detritivorous Gastropods  4.13  Toadfish  4.85  Gray Snapper  4.78  Raptors  5.66  Syringodium  1  Epiphytic Gastropods  2  Halfbeaks  3.26  Grunt  4.36  Crocodiles  5.39  Drift Algae  1  Predatory Gastropods  5.12  Other Killifish  3.7  Scianids  4.63  Dolphin  5.27  Epiphytes  1  Detritivorous Polychaetes  3.88  Goldspotted killifish  4.3  Spotted Seatrout  4.69  Water POC  5.04  Free Bacteria  2.92  Predatory Polychaetes  4.44  Rainwater killifish  3.82  Red Drum  4.77  Benthic POC  4.55  Water Flagellates  3.19  Suspension Feeding Polych  3.32  Silverside  3.95  Spadefish  4.58  Output  5.22  Benthic Flagellates  3.77  Macrobenthos  4.13  Other Horsefish  3.97  Parrotfish  3.86  Respiration  5.19  Benthic Ciliates  4.1  Benthic Crustaceans  3.88  Gulf Pipefish  4.06  Flatfish  4.58        Meiofauna  4.35  Detritivorous Amphipods  3.88  Dwarf Seahorse  3.97  Filefishes  4.78              Herbivorous Amphipods  2.59  Mojarra  4.32  Puffer  4.77              Isopods  2  Porgy  4.26  Other Pelagic Fishes  4.74              Herbivorous Shrimp  2  Pinfish  3.82  Small Herons and Egrets  4.85              Predatory Shrimp  3.9  Mullet  3.86  Ibis  4.8              Pink Shrimp  3.43  Blennies  4.12  Roseate Spoonbill  4.91              Thor Floridanus  2  Code Goby  4.41  Herbivorous Ducks  4.19              Detritivorous Crabs  3.72  Clown Goby  4.41  Omnivorous Ducks  4.28              Omnivorous Crabs  3.79  Other Demersal Fishes  4.21  Gruiformes  4.91              Sailfin Molly  2  DOC  1.92  Small Shorebirds  4.93              Green Turtle  2        Gulls and Terns  4.91                          Kingfisher  4.86                          Loggerhead Turtle  4.79                          Hawksbill Turtle  4.71                          Manatee  3.9        Significance of bold values are average values for each column. Table 4 shows the percentage of directed edges observed between blocks obtained by BAS clustering algorithm which exhibits the block upper triangular shape of the adjacency matrix of the trophic network. Finally, we check that our classification is globally consistent with trophic levels. Let us denote by $$V$$ the set of $$n=128$$ agents, $$\tau:V\rightarrow\{1,\ldots,5\}$$ the function assigning each agent to a block and $$l:V\rightarrow\mathbb{R}^+$$ the trophic levels such that $$l(x)$$ is the trophic level of agent $$x$$. We compute the inversion error  $$\frac{2}{n(n-1)}|\{(i,j)\in V\times V\text{ : }(\tau(i)-\tau(j))(l(i)-l(j))<0\}|$$ (7.4) which computes the proportion of pairs $$i,j$$ for which block memberships are inconsistent with trophic levels. We obtain an inversion error of $$7\%$$ which means that the partitioning is consistent with the ground truth trophic levels. In comparison, the partitioning obtained by applying SVD clustering algorithm produces an inversion error of $$25\%$$. This confirms that our algorithm is more efficient for detecting blocks in block-acyclic networks. As said before, the failure of SVD clustering algorithm is due to the inability of stochastic blockmodels to capture the structure of trophic networks. Table 4. Percentage of directed edges between blocks: entry $$(i,j)$$ equals $$100 E(i,j)/(S(i)S(j))$$ where $$E(i,j)$$ is the total number of directed edges from block $$i$$ to block $$j$$ and $$S(i)$$ and $$S(j)$$ are the number of nodes in blocks $$i$$ and $$j$$, respectively Upper triangular part of the matrix appears in bold.   Primary Producers  Primary Consumers  Secondary Consumers  Tertiary Consumers  Top-Level Predators  Primary Producers  $$6$$  $$\mathbf{25}$$  $$\mathbf{16}$$  $$\mathbf{9}$$  $$\mathbf{9}$$  Primary Consumers  $$0$$  $$7$$  $$\mathbf{47}$$  $$\mathbf{41}$$  $$\mathbf{23}$$  Secondary Consumers  $$0.4$$  $$2$$  $$9$$  $$\mathbf{29}$$  $$\mathbf{43}$$  Tertiary Consumers  $$0$$  $$0.3$$  $$6$$  $$2$$  $$\mathbf{26}$$  Top-Level Predators  $$0.6$$  $$2$$  $$4$$  $$0$$  $$7$$    Primary Producers  Primary Consumers  Secondary Consumers  Tertiary Consumers  Top-Level Predators  Primary Producers  $$6$$  $$\mathbf{25}$$  $$\mathbf{16}$$  $$\mathbf{9}$$  $$\mathbf{9}$$  Primary Consumers  $$0$$  $$7$$  $$\mathbf{47}$$  $$\mathbf{41}$$  $$\mathbf{23}$$  Secondary Consumers  $$0.4$$  $$2$$  $$9$$  $$\mathbf{29}$$  $$\mathbf{43}$$  Tertiary Consumers  $$0$$  $$0.3$$  $$6$$  $$2$$  $$\mathbf{26}$$  Top-Level Predators  $$0.6$$  $$2$$  $$4$$  $$0$$  $$7$$  Significance of bold symbols is to highlight the strictly upper triangular part of the matrix. Table 4. Percentage of directed edges between blocks: entry $$(i,j)$$ equals $$100 E(i,j)/(S(i)S(j))$$ where $$E(i,j)$$ is the total number of directed edges from block $$i$$ to block $$j$$ and $$S(i)$$ and $$S(j)$$ are the number of nodes in blocks $$i$$ and $$j$$, respectively Upper triangular part of the matrix appears in bold.   Primary Producers  Primary Consumers  Secondary Consumers  Tertiary Consumers  Top-Level Predators  Primary Producers  $$6$$  $$\mathbf{25}$$  $$\mathbf{16}$$  $$\mathbf{9}$$  $$\mathbf{9}$$  Primary Consumers  $$0$$  $$7$$  $$\mathbf{47}$$  $$\mathbf{41}$$  $$\mathbf{23}$$  Secondary Consumers  $$0.4$$  $$2$$  $$9$$  $$\mathbf{29}$$  $$\mathbf{43}$$  Tertiary Consumers  $$0$$  $$0.3$$  $$6$$  $$2$$  $$\mathbf{26}$$  Top-Level Predators  $$0.6$$  $$2$$  $$4$$  $$0$$  $$7$$    Primary Producers  Primary Consumers  Secondary Consumers  Tertiary Consumers  Top-Level Predators  Primary Producers  $$6$$  $$\mathbf{25}$$  $$\mathbf{16}$$  $$\mathbf{9}$$  $$\mathbf{9}$$  Primary Consumers  $$0$$  $$7$$  $$\mathbf{47}$$  $$\mathbf{41}$$  $$\mathbf{23}$$  Secondary Consumers  $$0.4$$  $$2$$  $$9$$  $$\mathbf{29}$$  $$\mathbf{43}$$  Tertiary Consumers  $$0$$  $$0.3$$  $$6$$  $$2$$  $$\mathbf{26}$$  Top-Level Predators  $$0.6$$  $$2$$  $$4$$  $$0$$  $$7$$  Significance of bold symbols is to highlight the strictly upper triangular part of the matrix. 7.2 Network of autonomous systems As a second application, we apply BAS clustering algorithm to an internet-based network. Autonomous Systems are sets of computers sharing a common routing protocol which roughly corresponds to computers that get their internet connection from the same Internet Service Provider (ISP). Based on their importance, ISPs can be partitioned in tiers: ISPs in lower tiers sign business agreements with higher tiers’ ISPs for data transit. Hence the network of money transfers between ISPs has a hierarchical structure[2]. We consider a graph $$G=(V,E,W)$$ in which vertices are ISPs and an edge $$(u,v)$$ represents a money transfer from ISP $$u$$ to ISP $$v$$ during a certain time span. We use the dataset published by the Center for Applied Internet Data Analysis [2] for 11 November 2013, which involves $$45\,427$$ ISPs (nodes) and 230 194 connections (edges). Unfortunately, business agreements between ISPs are often kept secret, hence these relationships are inferred from Border Gateway Protocol data using a heuristic method from [45]. To eliminate bidirectional connections between ISPs belonging to the same tier, we keep the asymmetric part of $$W$$ only: $$W\leftarrow (W-W^T)_+$$. The resulting graph is expected to be block-acyclic [17] and the ranks of vertices in the block-acyclic structure should reflect the partitioning of ISPs into tiers. Previous research suggested to partition ISPs into three tiers with tier $$1$$ containing the most important ISPs in terms of data transit [46]. The spectrum of the transition matrix of the graph is shown in Fig. 21. We observe that there are indeed three eigenvalues with a significantly larger modulus. Hence we apply BAS clustering algorithm for the extraction of $$k=3$$ blocks. Figure 22 shows the result of the partitioning and the number of connections between each block in the original network (of adjacency matrix $$W$$). This partitioning reveals the block-acyclic structure of the AS network. Fig. 21. View largeDownload slide $$200$$ top eigenvalues of the transition matrix of the AS network. The three cycle eigenvalues (with largest modulus) used by BAS clustering algorithm are circle-shaped. Fig. 21. View largeDownload slide $$200$$ top eigenvalues of the transition matrix of the AS network. The three cycle eigenvalues (with largest modulus) used by BAS clustering algorithm are circle-shaped. Fig. 22. View largeDownload slide Graph of the partitioning of nodes of the AS network into three blocks. Number of nodes in blocks 1, 2 and 3 are respectively 18 171, 20 896 and 6360. The label of each edge represents the number of connections from one block to another block (in thousands). Fig. 22. View largeDownload slide Graph of the partitioning of nodes of the AS network into three blocks. Number of nodes in blocks 1, 2 and 3 are respectively 18 171, 20 896 and 6360. The label of each edge represents the number of connections from one block to another block (in thousands). In order to assess the quality of the partitioning, we compare it to two available measures of the importance of Autonomous Systems. We first consider the transit degree of ISPs which measures the number of ISPs for which a given ISP provides a transit of data [2]. Considering the partitioning computed by BAS clustering algorithm, the average transit degree is $$0.31$$ in block $$1$$, $$5.16$$ in block $$2$$ and $$16.49$$ in block $$3$$. Moreover the inversion error between transit degrees and ranks computed by BAS clustering algorithm is $$6\%$$ which confirms the consistency of the partitioning with transit degrees. Secondly, we check the consistency of blocks with the grouping into tiers. There is no unique partitioning of ISPs in tiers and most of the ones that are available include ISPs in tier $$1$$ only. Hence we only check the coherence of our block partitioning with the most common version of tier $$1$$ which includes sixteen top ISPs in the world among which AT&T (U.S.), Deutsche Telekom (Germany), etc. All of these tier $$1$$ ISPs are classified in block $$3$$ of the block-acyclic network. The blocks discovered by our BAS clustering algorithm are thus somewhat coherent with the traditional partitioning into tiers. However both classifications are not expected to be equivalent since BAS clustering algorithm is only based on the graph topology while tiers also take additional information into account such as the ownership of an international fiber optic network, etc. 8. Conclusion In this article, we proposed two new algorithms to detect blocks of nodes in graphs with a block-cyclic or a block-acyclic structure. These algorithms are based on the computation of complex eigenvectors and eigenvalues of the transition matrix associated to the graph. We show that the algorithms succeed in detecting such blocks and they outperform other methods that are theoretically able to extract clusters from block-cyclic or block-acyclic graphs. As we mentioned, we seek specific structural patterns (cyclic or acyclic) in graphs but we make no assumption on the degrees of nodes or the number of connections, which makes our tailored approach more efficient and cheaper than methods that seek more general patterns such as algorithms based stochastic blockmodels. Moreover, the time complexity of BCS and BAS clustering algorithms is linear in the number of edges in the graph, which is the same as for other spectral methods such as traditional spectral clustering. We also show that BAS clustering algorithm provides a better understanding of a real-world database: it extracts trophic levels from a food web and it exhibits the hierarchical structure of a worldwide network of autonomous systems. As we mentioned, the lower quality of the partitioning provided by algorithms such as the ones based on stochastic blockmodels in block-cyclic and block-acyclic graphs is partly due to some assumptions of regularity within blocks which are not necessarily verified in some real-world networks such as food webs. In contrast, our algorithms detect block-cyclic and block-acyclic structures regardless of other regularity properties. Future work may include the development of a more advanced method for choosing a suitable number of clusters: although the proposed heuristic based on the average cluster silhouette provides an estimation of this number of blocks, it tends to make small mistakes in the presence of perturbation (e.g. detecting $$k+1$$ or $$k-1$$ blocks instead of $$k$$). Other methods of estimation of the number of blocks could be based on an inspection of the graph itself, by defining a criterion for assessing the quality of a block partitioning. Moreover, a more comprehensive theoretical analysis of nested block-cycles and block-acyclic graphs will be developed in order to assess the sensitivity of their cycle eigenvectors to the presence of perturbing edges and how it is related to the theory that was already developed for block-cycles. Another research direction would be to generalize this framework for the detection of clusters with more complex patterns of connection. For instance, we may generalize our algorithms to graphs in which a subset of nodes forms a block-cyclic structure (or a block-acyclic structure). If this block-cyclic component is weakly connected to the rest of the graph, then a straightforward extension of BCS clustering algorithm would detect it successfully. Finally BCS and BAS clustering algorithms could be applied to other real-world databases such as trade networks for instance, in which the exchange of goods might follow cyclic patterns between the different trading actors. In his book “The Human Group” [47] published in 1950, the American sociologist George Homans states that small social groups tend to form block-symmetric-acyclic networks, namely block-acyclic graphs with additional bidirectional links within blocks. It would be interesting to use BAS clustering algorithm to check if this assumption is verified for large web-based social networks involving directed connections such as Twitter for instance. Funding This work presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimisation), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office, and the ARC (Action de Recherche Concerte) on Mining and Optimization of Big Data Models funded by the Wallonia-Brussels Federation. A. Appendix A.1 Proofs of the lemmas and the theorem for the block-cycles The proofs of Lemma 3.1, Theorem 3.2 and Lemma 3.2 are provided below. Lemma 3.1 (Existence of cycle eigenvalues) Let $$G=(V,E,W)$$ be a block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$. Then $$\lambda_l=e^{-2\pi i\frac{l}{k}}\in spec(P)$$ for all $$0\leq l\leq k-1$$, namely there are $$k$$ eigenvalues located on a circle centered at the origin and with radius $$1$$ in the complex plane. The right eigenvector associated to the eigenvalue $$e^{-2\pi i\frac{l}{k}}$$ is   $$u^l_j=\left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}e^{2\pi i\frac{lk}{k}} & j\in V_1\\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l(k-1)}{k}} & j\in V_2\\ \vdots & \\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l}{k}} & j\in V_k \end{array}.\right.$$ (A.1) for which $$Pu^l=\lambda_lu^l$$ and $$\Vert u_l\Vert_2=1$$. Moreover, if $$G$$ is strongly connected, then the eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$ have multiplicity $$1$$ and all other eigenvalues of $$P$$ have a modulus strictly lower than $$1$$. Proof. For the $$l$$-th eigenvalue $$\lambda_l=e^{-2\pi i\frac{l}{k}}$$, we consider the following right eigenvector:   $$u^l_j=\left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}e^{2\pi i\frac{lk}{k}} & j\in V_1\\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l(k-1)}{k}} & j\in V_2\\ \vdots & \\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l}{k}} & j\in V_k \end{array}.\right.$$ (A.2) Then, $$\forall j\in V_s$$ with $$1\leq s\leq k-1$$:   $$[Pu^l]_j=\frac{1}{d^{out}_j}\sum_{r\in V_{s+1}}W_{jr}u^l_r.$$ (A.3) If $$j\in V_s$$ and $$r\in V_{s+1;}$$, clearly, $$u^l_r=e^{-2\pi i\frac{l}{k}}u^l_j$$, hence   $$\begin{array}{rcl} [Pu^l]_j &=& \frac{1}{d^{out}_j}\sum_{r\in V_{s+1}}W_{jr}u^l_r\\ &=& \frac{1}{d^{out}_j}\sum_{r\in V_{s+1}}W_{jr} e^{-2\pi i\frac{l}{k}}u^l_j\\ &=& e^{-2\pi i\frac{l}{k}}u^l_j\frac{1}{d^{out}_j}\sum_{r\in V_{s+1}}W_{jr}\\ &=& e^{-2\pi i\frac{l}{k}}u^l_j \end{array}$$ (A.4) Let us assume now that $$G$$ is strongly connected. To show that eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$ have multiplicity $$1$$, we refer to the Perron Frobenius theorem [26]. As $$G$$ is strongly connected, $$P$$ is irreducible and the definition of block-cycle implies that the period of $$G$$ is $$k$$. Hence, $$P$$ has exactly $$k$$ eigenvalues, each of multiplicity $$1$$ and having modulus $$1$$. These eigenvalues are necessarily the $$k$$ cycle eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$. □ Theorem 3.2 (Perturbation of cycle eigenvalues) Let $$G=(V,E,W)$$ be a strongly connected block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$, let $$\lambda_0,\ldots,\lambda_{k-1}$$ be the $$k$$ cycle eigenvalues and $$u^0,\ldots,u^{k-1}$$ be the corresponding right cycle eigenvectors. Let the $$\hat{G}=(V,\hat{E},\hat{W})$$ be a perturbed version of $$G$$ formed by appending positively weighted edges to $$G$$ except self-loops. Let $$P$$ and $$\hat{P}$$ denote the transition matrices of $$G$$ and $$\hat{G}$$ respectively. We define the quantities   $$\begin{array}{rcl} \sigma &=&\underset{(i,j)\in\hat{E}}{\max}\text{ }\frac{\hat{d}_j^{in}}{\hat{d}_i^{out}}\\ \rho &=&\underset{i}{\max}\text{ }\frac{\hat{d}_i^{out}-d_i^{out}}{d_i^{out}} \end{array}$$ (A.5) where $$d^{in}_i$$, $$d^{out}_i$$, $$\hat{d}^{in}_i$$ and $$\hat{d}^{out}_i$$ represent the in-degree and out-degree of $$i$$-th node in $$G$$ and $$\hat{G}$$, respectively. Then, 1. for any cycle eigenvalue $$\lambda_l\in spec(P)$$, there exists an eigenvalue $$\hat{\lambda}_l\in spec(\hat{P})$$ so that   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (A.6) where $$f$$ is the (left) Perron eigenvector of $$P$$, namely the vector $$f$$ verifying $$Pf=f$$ with $$\Vert f\Vert_1=1$$, 2. there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ verifying   $$\Vert\hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (A.7) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$, 3. the node embeddings $$\{x_1,\ldots,x_n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}_1,\ldots,\hat{x}_n\}\subset\mathbb{C}^k$$ defined as the rows of the right eigenvector matrices $$U=[u^1,\ldots,u^k]\in\mathbb{C}^{n\times k}$$ and $$\hat{U}=[\hat{u}^1,\ldots,\hat{u}^k]\in\mathbb{C}^{n\times k}$$, respectively verify   $$\Vert x^j - \hat{x}^j\Vert_2 \leq k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right)$$ (A.8) for each $$j\in\{1,\ldots,n\}$$. Proof We restrict ourselves to a perturbation that only consists of edges not already in $$E$$ as appending an edge that is already in $$E$$ amounts to increase the weight of this edge in $$G$$ which does not affect the block-cyclic structure of $$G$$. The transition matrix $$\hat{P}$$ can be expressed in the form $$\hat{P}=P+R$$ with $$R$$ given by   $$R_{ij}=\left\lbrace \begin{array}{ll} \frac{\hat{W}_{ij}}{\hat{d}_i^{out}} & \text{ if }(i,j)\in \hat{E}\setminus E\\ \frac{\hat{W}_{ij}}{\hat{d_i}^{out}}-\frac{W_{ij}}{d_i^{out}} & \text{ if }(i,j)\in E\cap \hat{E}\\ 0 & \text{ otherwise.} \end{array}\right.$$ (A.9) To prove claim $$1$$, we recall that cycle eigenvalues $$\{\lambda_l,0\leq l\leq k-1\}$$ have multiplicity $$1$$ from Lemma 3.1. Hence, from Theorem 2.3 page 183 of [28], for each cycle eigenvalue $$\lambda_l$$, there exists an eigenvalue $$\hat{\lambda}_l\in spec(\hat{P})$$ such that   $$\hat{\lambda}_l=\lambda_l +\frac{(y^l)^HRu^l}{(y^l)^Hu^l}+\mathscr{O}(\Vert R\Vert_2^2)$$ (A.10) where $$u^l$$ and $$y^l$$ are the right and left eigenvectors of $$P$$ associated to cycle eigenvalue $$\lambda_l$$ and such that $$\Vert u^l\Vert_2=1$$.   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \frac{\left\vert(y^l)^HRu^l\right\vert}{\left\vert(y^l)^Hu^l\right\vert}+\mathscr{O}(\Vert R\Vert_2^2).$$ (A.11) The denominator of the first order term can be expressed in the following way. One can show that the normalized right and left eigenvectors associated to cycle eigenvalues are   $$\begin{array}{rclcl} u_j^l&=&\frac{1}{\sqrt{n}} e^{il(k-r+1)\frac{2\pi}{k}}&\text{ for }&j\in V_r\\ y_j^l&=&f_j\sqrt{n}e^{-il(r-1)\frac{2\pi}{k}}&\text{ for }&j\in V_r \end{array}$$ (A.12) where $$f$$ is the (left) Perron eigenvector of $$P$$, namely the vector $$f$$ with nonnegative real entries and verifying $$Pf=f$$ with $$\Vert f\Vert_1=1$$. As $$f$$ is real and non-negative by the Perron Frobenius theorem,   $$\begin{array}{r@{\ }c@{\ }l} (y^l)^Hu^l &=& \sum_j (y_j^l)^*u_j^l\\ &=& \sum_j f_je^{il(r-1)\frac{2\pi}{k}}e^{il(k-r+1)\frac{2\pi}{k}}\\ &=& \sum_j e^{2\pi li}f_j\\ &=& \sum_j f_j\\ &=& 1. \end{array}$$ (A.13) Regarding the numerator, we have   $$\left\vert(y^l)^HRu^l\right\vert\leq \Vert y^l\Vert_2 \Vert R\Vert_2 \Vert u^l\Vert_2=\Vert y^l\Vert_2\Vert R\Vert_2=\sqrt{n}\Vert f\Vert_2\Vert R\Vert_2$$ (A.14) where $$\Vert R\Vert_2=\sqrt{\lambda_{\max}(R^TR)}$$. By Gershgorin circle theorem, we have   $$\lambda_{\max}(R^TR) \leq \underset{t}{\max}\sum_j\sum_s \left\vert\frac{\hat{W}_{st}}{\hat{d}^{out}_s}-\frac{W_{st}}{d^{out}_s}\right\vert \left\vert\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}-\frac{W_{sj}}{d^{out}_s}\right\vert.$$ (A.15) We simplify the right member of the equation:   $$\sum_j\sum_s \left\vert\frac{\hat{W}_{st}}{\hat{d}^{out}_s}-\frac{W_{st}}{d^{out}_s}\right\vert \left\vert\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}-\frac{W_{sj}}{d^{out}_s}\right\vert =\sum_s \left\vert\frac{\hat{W}_{st}}{\hat{d}^{out}_s}-\frac{W_{st}}{d^{out}_s}\right\vert \sum_j \left\vert\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}-\frac{W_{sj}}{d^{out}_s}\right\vert$$ (A.16) in which   $$\begin{array}{r@{\ }c@{\ }l} \sum_j \left\vert\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}-\frac{W_{sj}}{d^{out}_s}\right\vert &=& \underset{j:W_{sj}=0}{\sum}\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}+\underset{j:W_{sj}\neq 0}{\sum}W_{sj}(\frac{1}{d^{out}_s}-\frac{1}{\hat{d}^{out}_s})\\ &=& \frac{\hat{d}^{out}_s-d^{out}_s}{\hat{d}^{out}_s}+1-\frac{d_s^{out}}{\hat{d}^{out}_s}\\ &=& 2(1-\frac{d_s^{out}}{\hat{d}_s^{out}})\\ &=& 2\frac{1}{1+\frac{d_s^{out}}{\hat{d}_s^{out}-d_s^{out}}}\\ &\leq& 2\frac{\rho}{\rho +1} \end{array}$$ (A.17) where the first equality follows from $$\underset{j:W_{sj}\neq 0}{\sum}W_{sj}=d_s^{out}$$ and $$\underset{j:W_{sj}\neq 0}{\sum}\hat{W}_{sj}=\hat{d}_s^{out}-d_s^{out}$$. Moreover, we have   $$\begin{array}{r@{\ }c@{\ }l} \sum_s \left\vert\frac{\hat{W}_{st}}{\hat{d}^{out}_s}-\frac{W_{st}}{d^{out}_s}\right\vert & = & \sum_{s:W_{st}=0}\frac{\hat{W}_{st}}{\hat{d}_s^{out}}+\sum_{s:W_{st}\neq 0}W_{st}(\frac{1}{d_s^{out}}-\frac{1}{\hat{d}_s^{out}})\\ &\leq& \sum_{s:W_{st}=0}\frac{\hat{W}_{st}}{\hat{d}_s^{out}}+\sum_{s:W_{st}\neq 0}\frac{W_{st}}{d_s^{out}}\\ &\leq& \sum_s \frac{\hat{W}_{st}}{d_s^{out}}\\ &=& \sum_s \frac{\hat{W}_{st}}{\hat{d}_s^{out}}\frac{\hat{d}_s^{out}}{d_s^{out}}\\ &\leq& (\rho+1)\sum_s \frac{\hat{W}_{st}}{\hat{d}_s^{out}}\\ &\leq& (\rho+1) \frac{d^{in}_t}{\underset{s:(s,t)\in E}{\min}\hat{d}_s^{out}}\\ &\leq & \sigma(\rho+1) \end{array}$$ (A.18) Putting Equation A.17 back into A.16, we have   $$\begin{array}{r@{\ }c@{\ }l} \sum_s \left\vert\frac{\hat{W}_{st}}{\hat{d}^{out}_s}-\frac{W_{st}}{d^{out}_s}\right\vert \sum_j \left\vert\frac{\hat{W}_{sj}}{\hat{d}^{out}_s}-\frac{W_{sj}}{d^{out}_s}\right\vert &\leq& 2\sigma(\rho+1)\frac{\rho}{\rho +1}\\ &=& 2\sigma\rho \end{array}$$ (A.19) Hence, the first order term of Equation A.11 can be written as   $$\frac{\left\vert(y^l)^HRu^l\right\vert}{\left\vert(y^l)^Hu^l\right\vert}=\sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}$$ (A.20) Finally, the upper bound on the perturbation of the cycle eigenvalues   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (A.21) To prove the second claim, as cycle eigenvalue $$\lambda_l$$ has multiplicity $$1$$, from Theorem 2.8 p. 238 of [28], there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ verifying   $$\Vert \hat{u}^l-u^l\Vert_2\leq \Vert (\lambda^lI-P)^{\#}\Vert_2 \Vert R\Vert_2+\mathscr{O}\left(\Vert R\Vert_2^2\right)$$ (A.22) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$. From Equation A.19, this expression becomes   $$\Vert \hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma \rho\right)$$ (A.23) To prove the third claim, we observe that the node embeddings the node embeddings $$\{x_1,\ldots,x_n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}_1,\ldots,\hat{x}_n\}\subset\mathbb{C}^k$$ verify   $$\begin{array}{r@{\ }c@{\ }l} \Vert x^j - \hat{x}^j\Vert_2 &\leq& \Vert x^j - \hat{x}^j\Vert_1\\ &\leq& k\underset{l=0,\ldots,k-1}{\max}\Vert u^l - \hat{u}^l\Vert_2\\ &\leq& k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right) \end{array}$$ (A.24) where the last inequality follows from the second claim. □ Lemma 3.2 (Distance between centroids) The $$k$$ distinct forms $$c^0,\ldots,c^{k-1}$$ that can be taken by the rows of the right eigenvector matrix $$U$$ in the case of an unperturbed block-cycle (defined by Equation 3.4) are pairwise equidistant and, for all $$r\neq s\in\{0,\ldots,k-1\}$$,   $$\Vert c^r-c^s\Vert_2=\sqrt{\frac{2k}{n}}$$ (A.25) Proof For $$r,s\in\{0,\ldots,k-1\}$$ with $$r\neq s$$,   \begin{align} \Vert c^r-c^s\Vert_2^2 &= \frac{1}{\sqrt{n}}\sum\limits_{l=0}^{k-1} \left| e^{\frac{2\pi i}{k}l(k-r+1)}-e^{\frac{2\pi i}{k}l(k-s+1)}\right|^2\nonumber\\ &= \frac{4}{n}\sum\limits_{l=0}^{k-1} \sin^2\left(\frac{\pi l}{k}(s-r)\right) \left| e^{\pi i(\frac{1}{2}+\frac{l}{k}(2k-(r+s)+2)}\right|^2\nonumber\\ &= \frac{4}{n}\sum\limits_{l=0}^{k-1} \sin^2\left(\frac{\pi l}{k}(s-r)\right)\nonumber\\ &= \frac{2}{n}\sum\limits_{l=0}^{k-1}\left(1-\cos\left(\frac{2\pi l}{k}(r-s)\right)\right)\nonumber\\ &= \frac{2k}{n} \end{align} (A.26) which terminates the proof. □ A.2 Analysis of the Drazin generalized inverse We first prove the validity of Equation 3.14, then we provide empirical evidence that Equations 3.15 and 3.16 are valid. Lemma A.1 Let $$G=(V,E,W)$$ be a block-cycle such that $$d_i^{out}>0$$ for all $$i\in V$$, let $$P$$ be the transition matrix of $$G$$ and let $$\{\lambda_1,\ldots,\lambda_k\}$$ be the cycle eigenvalues of $$G$$, then if matrix $$P$$ is diagonalizable, the norm of the Drazin generalized inverse of $$(\lambda_l-P)$$ for any cycle eigenvalue $$\lambda_l$$ verifies   $$\Vert (\lambda_lI-P)^{\#}\Vert_2\leq (n-1)\left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1}(\Vert Y^C_{-l}\Vert_2+\Vert Y^{NC}\Vert_2).$$ (A.27) where $$Y^C_{-l}$$ and $$Y^{NC}$$ represent the matrices of cycle and non-cycle left eigenvectors respectively: if $$\{u^r\text{ : }1\leq r\leq n\}$$ and $$\{y^r\text{ : }1\leq r\leq n\}$$ are the right and left eigenvectors of $$P$$ such that $$\Vert u^1\Vert_2=...=\Vert u^n\Vert_2=1$$ and $$(y^1)^Hu^1=...=(y^n)^Hu^n=1$$, then   $$\begin{array}{rcl} Y^{NC}&=&[y^{k+1},\ldots,y^n]\\ Y^C_{-l}&=&[y^1,\ldots,y^{l-1},y^{l+1},\ldots,y^k], \end{array}$$ (A.28) namely $$Y^C_{-l}$$ is the concatenation of the left cycle eigenvectors except the $$l$$-th one and $$Y^{NC}$$ is the concatenation of all left eigenvectors15. Proof Let $$l=1$$, without loss of generality. Define the matrix of right eigenvectors $$U=[u^2,\ldots,u^n]$$ and the matrix of corresponding left eigenvectors $$Y=[y^2,\ldots,y^n]$$ (excluding the first right and left eigenvector $$u^1$$ and $$y^1$$) such that $$\Vert u^2\Vert_2=...=\Vert u^n\Vert_2=1$$ and $$Y^HU=I$$16. From the definition of the Drazin generalized inverse p. 238 of [28],   $$(\lambda_lI-P)^{\#}=U(\lambda_lI-Y^HPU)^{-1}Y^H.$$ (A.29) Hence,   $$\begin{array}{rcl} \Vert (\lambda_lI-P)^{\#}\Vert_2&=&\Vert U(\lambda_lI-Y^HPU)^{-1}Y^H\Vert_2\\ &\leq & \Vert U\Vert_2 \Vert (\lambda_lI-Y^HPU)^{-1}\Vert_2 \Vert Y\Vert_2 \end{array}$$ (A.30) where the first factor satisfies   $$\Vert U\Vert_2 \leq \Vert U\Vert_F\leq \sum_{r\neq 1}\Vert u^l\Vert_2=(n-1),$$ (A.31) the second factor verifies   $$\Vert (\lambda_lI-Y^HPU)^{-1}\Vert_2 = \left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1}$$ (A.32) and the third factor verifies   $$\Vert Y\Vert_2 \leq\Vert Y^C_{-l}\Vert_2+\Vert Y^{NC}\Vert_2$$ (A.33) which follows from Weyl’s inequality [49] and from the fact that $$Y=[Y^{C}_{-l},Y^{NC}]$$. Hence,   $$\Vert (\lambda_lI-P)^{\#}\Vert_2\leq (n-1)\left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1}(\Vert Y^C_{-l}\Vert_2+\Vert Y^{NC}\Vert_2).$$ (A.34) □ We next provide empirical evidence that Equations 3.15 and 3.16 are valid in practice. Regarding Equation 3.15, we show that for any block-cycle $$G$$ of at least $$k=7$$ blocks with diagonalizable transition matrix $$P$$ and for any cycle eigenvalue $$\lambda_l$$, we have   $$\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|=2\sin\left(\frac{\pi}{k}\right)\!.$$ (A.35) which represents the shortest distance between any two distinct cycle eigenvalues. Hence, to show that Equation A.35 is valid, we verify whether the closest eigenvalue to a cycle eigenvalue is itself a cycle eigenvalue. This is clearly not the case when $$k\leq 6$$, since we have then $$2\sin\left(\frac{\pi}{k}\right)\geq 1$$. However, for $$k\geq 7$$ blocks, we provide an empirical evidence that the above claim holds. If $$C=\{1,2,\ldots,k\}$$, and $$\lambda_1,\ldots,\lambda_k$$ are the $$k$$ cycle eigenvalues, we display in Figure A.1a the evolution of the quantities   $$\begin{array}{rcl} \mu^C&=&\underset{r,s\in C}{\min}|\lambda_r-\lambda_s|\\ \mu^{NC}&=&\underset{s\notin C, r\in C}{\min}|\lambda_r-\lambda_s| \end{array}$$ (A.36) for a block-cycle generated by a stochastic blockmodel $$SBM(k,\gamma,\Phi)$$ with $$k=7$$ balanced blocks and   $$\Phi_{rs}=\left\lbrace\begin{array}{ll} \alpha & \text{ if }i,j\in\{(1,2),(2,3),\ldots,(k-1,k),(k,1)\}\\ 0 & \text{ otherwise} \end{array}\right.$$ (A.37) and for increasing values of the probability $$\alpha$$ of connection between consecutive blocks. Figure A.1a shows that $$\mu^C\leq\mu^{NC}$$ for all values of $$\alpha$$ that was tested, which confirms that the eigenvalue closest to any cycle eigenvalue is itself a cycle eigenvalue and thus, that equation A.35 holds. A similar phenomenon is observed for any number $$k$$ of blocks above $$7$$. To show that equation 3.16 holds we use the definition of the left cycle eigenvectors (Equation A.12) which gives   $$\Vert Y^C_{-l}\Vert_2 \leq \Vert Y^C_{-l}\Vert_F \leq\sum\limits_{\substack{r=1\\r\neq l}}^k\Vert y_r\Vert_2=(k-1)\sqrt{n}\Vert f\Vert_2$$ (A.38) where $$f$$ is Perron eigenvector verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$. To provide more insight in the norm of matrix   $$Y^{NC}=[y^{k+1},\ldots,y^n]$$ (A.39) involved in Equation A.34, Figure A.1b displays the evolution of $$\Vert Y^{NC}\Vert_2$$ for a block-cycle generated by the same $$SBM(k,\gamma,\Phi)$$ stochastic blockmodel and for increasing values of $$\alpha$$. While the quantity is relatively constant with respect to $$\alpha$$, providing a bound on $$\Vert Y^{NC}\Vert_2$$ is tedious since it involves the condition numbers of all the non-cycle eigenvectors. However, taking inspiration from Equation A.38, we do observe that   $$\Vert Y^{NC}\Vert_2\leq (n-k)\sqrt{n}\Vert f\Vert_2$$ (A.40) when the blocks are balanced, as can be seen from Figure A.1b. The inequality was verified for different values of $$k$$ and a future research direction would be to determine under which condition Equation A.40 holds. Fig. A.1. View largeDownload slide (a) Evolution of the minimum distance between cycle and non-cycle eigenvalues ($$\mu^{NC}$$ in Equation A.36) and the minimum distance between cycle eigenvalues ($$\mu^{C}$$ in Equation A.36) as a function of the probability of connection between consecutive blocks. (b) Evolution of $$\Vert Y^{NC}\Vert_2$$ for a block-cycle generated by the same $$SBM(k,\gamma,\Phi)$$ stochastic blockmodel and for increasing values of the probability $$\alpha$$ of connection between consecutive blocks, the proposed bound is also displayed. Fig. A.1. View largeDownload slide (a) Evolution of the minimum distance between cycle and non-cycle eigenvalues ($$\mu^{NC}$$ in Equation A.36) and the minimum distance between cycle eigenvalues ($$\mu^{C}$$ in Equation A.36) as a function of the probability of connection between consecutive blocks. (b) Evolution of $$\Vert Y^{NC}\Vert_2$$ for a block-cycle generated by the same $$SBM(k,\gamma,\Phi)$$ stochastic blockmodel and for increasing values of the probability $$\alpha$$ of connection between consecutive blocks, the proposed bound is also displayed. A.3 Proof of the lower bound on the eigenvalues of a nested block-cycle Lemma 4.1 states that they are at least $$k$$ eigenvalues of modulus greater than $$\frac{1}{2k-1}$$ for the transition matrix of a complete nested block-cycle with $$k$$ equally sized blocks. By complete, we mean all allowed edges between the blocks according to definition 4.1 are present in the graph. The proof of Lemma 4.1 is below. Lemma 4.1 (Lower bound on eigenvalues of a nested block-cycle) For a nested block-cycle $$G=(V,E)$$ of $$k$$ blocks of equal size, if the nested block is complete, then there exist at least $$k$$ eigenvalues $$\lambda_1,\ldots,\lambda_k$$ of the transition matrix $$P$$ such that   $$\vert \lambda_l\vert \geq \frac{1}{2k-1}$$ (A.41) Proof We fist consider the case in which each block consists of one node only, namely there $$n=k$$ nodes in the nested block-cycle. We denote by $$\Pi\in [0,1]^{k\times k}$$ the transition matrix in this case. If the nested block-cycle is complete, namely if all possible connections are present, then the transition matrix of the graph is   $$\Pi_{ij}=\left\lbrace\begin{array}{cl} \frac{1}{k-i}&\text{ if }1\leq i<j\leq k\\ \frac{1}{k}&\text{ if }i=k\\ 0&\text{ otherwise.} \end{array}\right.$$ (A.42) One can show that matrix $$\Pi$$ is diagonalizable and we define its eigenvalues as $$\lambda_1,\ldots,\lambda_k$$ such that $$|\lambda_1|\leq ...\leq |\lambda_k|$$. Then, the inverse $$\Pi^{-1}$$ of $$P$$ is the matrix $$\Phi$$ given by   $$\Phi_{ij}=\left\lbrace\begin{array}{cl} -(k-i)&\text{ if }i=j<k\\ (k-i)&\text{ if }i=j+1<k\\ k&\text{ if }(i,j)=(1,k)\\ 0&\text{ otherwise.} \end{array}\right.$$ (A.43) Indeed, if $$i,j<k$$,   $$\sum\limits_l\Pi_{il}\Phi_{lj}=\sum\limits_{l=i+1}^k\frac{1}{k-i}P_{lj}=\left\lbrace\begin{array}{ll} \frac{k-i}{k-i}=1&\text{ if }i=j\\ 0&\text{ otherwise,} \end{array}\right.$$ (A.44) if $$i<k$$ and $$j=k$$,   $$\sum\limits_l\Pi_{il}\Phi_{lk}=\sum\limits_{l=i+1}^k\frac{1}{k-i}P_{lk}=0,$$ (A.45) if $$i=k$$ and $$j<k$$,   $$\sum\limits_l\Pi_{kl}\Phi_{lj}=\sum\limits_{l=i+1}^k\frac{1}{k}\Phi_{lj}=0,$$ (A.46) and if $$i=j=k$$,   $$\sum\limits_l\Pi_{kl}\Phi_{lk}=\sum\limits_{l=i+1}^k\frac{1}{k}k\delta_{lk}=1.$$ (A.47) Hence, $$\Phi=\Pi^{-1}$$. To provide an upper bound on the spectral radius $$\rho(\Pi^{-1})$$ of $$\Pi^{-1}$$, we apply the Gershgorin circle theorem which states that for each eigenvalue $$\mu$$ of $$\Pi^{-1}$$, at least one of the following equalities hold   $$\left\lbrace\begin{array}{ll} |\mu+(k-i)|\leq k-i+1 &\text{ for }i=1,\ldots,k-1\\ |\mu|\leq k. \end{array}\right.$$ (A.48) Since $$|\mu|=|\mu+(k-i)-(k-i)|\leq |\mu+(k-i)|+(k-i)$$, at least one of the following $$k$$ inequalities holds   $$\left\lbrace\begin{array}{ll} |\mu|\leq 2k-2i+1 &\text{ for }i=1,\ldots,k-1\\ |\mu|\leq k. \end{array}\right.$$ (A.49) Since the highest bound on $$|\mu|$$ expressed by the above equations is $$2k-1$$, we have $$|\mu|\leq 2k-1$$ for each eigenvalue $$\mu$$ of $$\Pi^{-1}$$ which implies   $$\rho(\Pi^{-1})\leq 2k-1$$ (A.50) Moreover, since $$\text{spec}(\Pi^{-1})=\{\frac{1}{\lambda_1},\ldots,\frac{1}{\lambda_k}\}$$, we have   $$\frac{1}{|\lambda_1|}=\rho(\Pi^{-1})\leq 2k-1$$ (A.51) and   $$|\lambda_k|\geq |\lambda_{k-1}|\geq ...\geq |\lambda_1|\geq \frac{1}{2k-1}$$ (A.52) Now we consider the general case of a complete nested block-cycle of $$n$$ nodes, with $$k$$ blocks $$S^1,\ldots,S^k$$ of equal size $$\frac{n}{k}$$. Then, we have   $$\lambda\in \text{spec}(\Phi)\Rightarrow \lambda\in \text{spec}(P)$$ (A.53) Indeed, for each right eigenvector $$v\in \mathbb{C}^k\setminus \{0\}$$, such that $$\Pi v=\lambda v$$ for some $$\lambda\in\mathbb{C}$$, we define the vector $$\tilde{v}\in\mathbb{C}^{n}$$ such that   $$\tilde{v}_i=v_{\tau(i)}$$ (A.54) where $$\tau(i)$$ is the block of node $$i$$. Then, for $$\tau(i)<k$$, we have   $$\sum\limits_jP_{ij}\tilde{v}_j=\sum\limits_{r=\tau(i)+1}^k\frac{k}{n(k-s)}|S^r|v_r=\sum\limits_{r=\tau(i)+1}^k\frac{k}{n(k-s)}\frac{n}{k}v_r=\sum\limits_{r=\tau(i)+1}^k\frac{1}{k}v_r=\lambda v_r=\lambda\tilde{v_i}$$ (A.55) and, for $$\tau(i)=k$$, we have   $$\sum\limits_jP_{ij}\tilde{v}_j=\sum\limits_{r=1}^k\frac{1}{n}|S^r|v_r=\sum\limits_{r=1}^k\frac{1}{n}\frac{n}{k}v_r=\lambda v_k=\lambda\tilde{v}_i$$ (A.56) which implies that $$P\tilde{v}=\lambda\tilde{v}$$. Hence, from Equation A.52, there are at least $$k$$ eigenvalues $$\lambda_1,\ldots,\lambda_k$$ in the spectrum of $$P$$ such that   $$|\lambda_l|\geq\frac{1}{2k-1}$$ (A.57) for each $$l\in\{1,\ldots,k\}$$. □ A.4 Perturbation analysis for nested block-cycles In this subsection, we provide a brief empirical analysis of the quantities involved in the perturbation bounds for the nested block-cycles (Equations 4.3, 4.4 and 4.5), namely the maximum norm of left cycle eigenvector $$M_1=\underset{l}{\max}\Vert y^l\Vert_2$$, the maximum norm of the Drazin inverse $$M_2=\underset{l}{\max}\Vert (\lambda_lI-P)^{\#}\Vert_2$$. A nested block-cycle is formed from a block-cycle of $$k=8$$ balanced blocks and with a probability of $$0.7$$ of connection between consecutive blocks, then edges that satisfy the definition 4.1 of nested block-cycle are randomly appended to the graph with an increasing edge probability $$\alpha$$. Figure A.2a shows the evolution of the maximum norm $$M^1$$ of left cycle eigenvector as a function of $$\alpha$$. $$M_1$$ tends to increase smoothly (up to ten times its initial value) with $$\alpha$$, namely when the graph evolves from a purely block-cyclic structure towards a structure of nested block-cycle. Similarly, Figure A.2b show that the maximum norm $$M^2$$ of the Drazin generalized inverse increases in a linear fashion with respect to $$\alpha$$. The increase both in $$M^1$$ and $$M^2$$ explains the higher sensitivity of our BAS algorithm to perturbations in the detection of blocks in block-acyclic graphs, when compared to the performance of our BCS algorithm for block-cycles. Fig. A.2. View largeDownload slide (left) Evolution of the maximum norm of left cycle eigenvector $$y_l$$ as a function of the probability of connection between consecutive blocks. (right) Evolution of the maximum norm of Drazin generalized inverse $$(\lambda_lI-P)^{\#}$$ among all cycle eigenvalues as a function of the probability of connection between consecutive blocks. Fig. A.2. View largeDownload slide (left) Evolution of the maximum norm of left cycle eigenvector $$y_l$$ as a function of the probability of connection between consecutive blocks. (right) Evolution of the maximum norm of Drazin generalized inverse $$(\lambda_lI-P)^{\#}$$ among all cycle eigenvalues as a function of the probability of connection between consecutive blocks. Footnotes 1 Typically, experiments show that up to $$3|E|$$ perturbing edges can be appended to a block-cycle without affecting the accuracy of our clustering algorithm. 2 The proof of the lemma is provided in Appendix. 3 The proof of theorem 3.2 is in Appendix 4 The proof of the lemma is provided in Appendix. 5 In the unweighted case, the probability of existence of a perturbing edge connecting any pair of node is equal. 6 This is the case when, for instance, the block-cycle is generated by a stochastic blockmodel with an equal probability of connection between any pair of consecutive blocks in the block-cycle. 7 i.e. verifying $$Pu^r=\lambda u^r$$ and $$(y^r)^HP=\lambda (y^r)^H$$ for some scalar $$\lambda\in\mathbb{C}$$ 8 Typically a perturbation of $$|E|$$ perturbing edges. 9 We refer to [32] for more details about the convergence of iterative algorithms such as the Arnoldi iteration for the computation of eigenvalues of non-symmetric matrices. 10 The proof of Lemma 4.1 is provided in Appendix A.3 11 defined as $$\left(\frac{|\hat{E}|-|E|}{|E|}\right)$$. The total weight of the perturbing edges divided by the total weight of the edges in the nested block-cycle. 12 Blue triangles on Fig. 7a. 13 For the sake of simplicity and since the block membership error exhibits the same pattern, we only display the NMI-based error for this test. 14 For a predicted number of clusters of $$\tilde{k}$$ and a true number of clusters $$k$$, we define the relative error as $$\frac{|\tilde{k}-k|}{k}$$. 15 For the sake of clarity, we explicitly refer to the block-cycle and the cycle eigenvalues in this lemma, but it is actually a general result that holds for any diagonalizable matrix. 16 Note that ensuring $$Y^HU=I$$ is always possible for $$P$$ diagonalizable since any left eigenvector is orthogonal to any right eigenvector corresponding to a different eigenvalue and, for degenerate eigenvalues, this can be achieved by a Gram-Schmidt orthogonalization as suggested at page 451 of [48]. References 1. Elton C. S. ( 1927) Animal Ecology . ( Huxley J. S. ed.), 1st edn. New York, NY: The Macmillan Company. 2. Center for Applied Internet Analysis ( 2013) Dataset of AS relationships , La Jolla, California: The CAIDA UCSD, http://www.caida.org/data/as-relationships/ (last accessed 18 September 2017). 3. Post W. M., Peng T.-H., Emanuel W. R., King A. W., Dale V. H., DeAngelis D. L. ( 1990) The global carbon cycle. Am. Sci. , 78, 310– 326. 4. Malliaros F. D. & Vazirgiannis M. ( 2013) Clustering and community detection in directed networks: A survey. Phys. Rep. , 533, 95– 142. Google Scholar CrossRef Search ADS   5. Chung F. ( 2005) Laplacians and the Cheeger inequality for directed graphs. Ann. Combinator. , 9, 1– 19. Google Scholar CrossRef Search ADS   6. Huang J., Zhu T. & Schuurmans D. ( 2006) Web Communities Identification from Random Walks. Proceedings of the10th European conference on Principles and Practice of Knowledge Discovery in Databases: PKDD 2006  ( Fürnkranz J. Scheffer T. & Spiliopoulou M. eds). Berlin, Germany: Springer, pp. 187– 198. Google Scholar CrossRef Search ADS   7. Reichardt J. & White D. R. ( 2007) Role models for complex networks. Eur. Phys. J. B Condens. Matter , 60, 217– 224. 8. Sussman D. L., Tang M., Fishkind D. E. & Priebe C. E. ( 2012) A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Am. Stat. Assoc. , 107, 1119– 1128. Google Scholar CrossRef Search ADS   9. Satuluri V. & Parthasarathy S. ( 2011) Symmetrizations for clustering directed graphs. Proceedings of the 14th International Conference on Extending Database Technology . New York, USA: ACM, pp. 343– 354. 10. Karrer B. & Newman M. E. ( 2011) Stochastic blockmodels and community structure in networks. Phys. Rev. E , 83, 016107. Google Scholar CrossRef Search ADS   11. Ramasco J. J. & Mungan M. ( 2008) Inversion method for content-based networks. Phys. Rev. E , 77, 036122. Google Scholar CrossRef Search ADS   12. Peixoto T. P. ( 2014) The graph-tool python library. figshare , https://figshare.com/articles/graph_tool/1164194 (last accessed 23 February 2018). 13. Von Luxburg U. ( 2007) A tutorial on spectral clustering. Stat. Comput. , 17, 395– 416. Google Scholar CrossRef Search ADS   14. Gleich D. ( 2006) Hierarchical directed spectral graph partitioning , Stanford, California: Stanford University. 15. Pentney W. & Meila M. ( 2005) Spectral Clustering of Biological Sequence Data. Proceedings of the 20th National Conference on Artificial Intelligence , AAAI’05. Palo Alto, California: AAAI Press, pp. 845– 850. 16. Klymko C. & Sanders G. ( 2016) Detecting highly cyclic structure with complex eigenpairs. arXiv preprint arXiv:1609.05740 . 17. Van Lierde H., Delvenne J.-C., Van Dooren P. & Saerens M. ( 2015) Spectral clustering algorithms for directed graphs. Master’s thesis , Universite Catholique de Louvain. 18. Conrad N. D., Banisch R. & Schütte C. ( 2015) Modularity of directed networks: Cycle decomposition approach. J. Comput. Dynam. , 2, 1– 24. Google Scholar CrossRef Search ADS   19. Conrad N. D., Weber M. & Schutte C. ( 2016) Finding dominant structures of nonreversible Markov processes. Multiscale Model. Simul. , 14, 1319– 1340. Google Scholar CrossRef Search ADS   20. Wang Y. J. & Wong G. Y. ( 1987) Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. , 82, 8– 19. Google Scholar CrossRef Search ADS   21. Le Cam L. ( 1960) An approximation theorem for the Poisson binomial distribution. Pac. J. Math. , 10, 1181– 1197. Google Scholar CrossRef Search ADS   22. Beguerisse-Diaz M., Vangelov B. & Barahona M. ( 2013) Finding role communities in directed networks using role-based similarity, Markov stability and the relaxed minimum spanning tree. 2013 IEEE Global Conference on Signal and Information Processing . Piscataway, New Jersey: IEEE, pp. 937– 940. 23. Balakrishnan R. & Ranganathan K. ( 2012) A Textjournal of Graph Theory  ( Axler S. Capasso V. Casacuberta C. MacIntyre A. J. Ribet K. Sabbah C. Suli E. & Woyczynski W. A. eds), 2nd edn. New York, NY: Springer Science & Business Media. 24. Leskovec J., Rajaraman A. & Ullman J. D. ( 2014) Mining of Massive Datasets , 2nd edn. Cambridge, UK: Cambridge University Press. Google Scholar CrossRef Search ADS   25. De Mazancourt T. & Gerlic D. ( 1983) The inverse of a block-circulant matrix. IEEE Trans. Antennas Propag. , 31, 808– 810. Google Scholar CrossRef Search ADS   26. Seneta E. ( 1981) Non-Negative Matrices and Markov Chains ., 2nd edn. New York, NY: Springer Science & Business Media. Google Scholar CrossRef Search ADS   27. Kirkland S. J. & Neumann M. ( 2012) Group Inverses of M-Matrices and Their Applications , 1st edn. Boca Raton, Florida: CRC Press. 28. Stewart G. W. & Sun J.-g. ( 1990) Matrix perturbation theory  ( Reinboldt W. & Siewiorek D. eds), 1st edn. Cambridge, Massachusetts: Academic Press. 29. Saad Y. ( 1992) Numerical Methods for Large Eigenvalue Problems ., 1st edn. Manchester, UK: Manchester University Press. 30. Golub G. H. & Van Loan C. F. ( 2012) Matrix Computations , vol. 3. Baltimore, Maryland: JHU Press. 31. Lloyd S. ( 1982) Least squares quantization in PCM. IEEE Trans. Inf. Theory , 28, 129– 137. Google Scholar CrossRef Search ADS   32. Bellalij M. Saad Y. & Sadok H. ( 2007) On the convergence of the Arnoldi process for eigenvalue problems , Minneapolis, Minnesota: Minnesota Supercomputer Institute, University of Minnesota. 33. Rousseeuw P. J. ( 1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Computat. Appl. Math. , 20, 53– 65. Google Scholar CrossRef Search ADS   34. Brin S. & Page L. ( 2012) Reprint of: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. , 56, 3825– 3833. Google Scholar CrossRef Search ADS   35. Kuhn H. W. ( 1955) The Hungarian method for the assignment problem. Nav. Res. Logist. , 2, 83– 97. Google Scholar CrossRef Search ADS   36. Knops Z. F., Maintz J. B. A., Viergever M. A. & Pluim J. P. W. ( 2006) Normalized mutual information based registration using k-means clustering and shading correction. Med. Image Anal. , 10, 432– 439. Google Scholar CrossRef Search ADS PubMed  37. Davies D. L. & Bouldin D. W. ( 1979) A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI-1, pp. 224– 227. Google Scholar CrossRef Search ADS   38. Kotz S., Balakrishnan N. & Johnson N. L. ( 2005) Continuous Multivariate Distributions, Volume 1: Models and Applications , vol. 1, 1st edn. Hoboken, New Jersey: John Wiley & Sons. 39. Ester M., Kriegel H.-P., Sander J. & Xu X. ( 1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , KDD’96. Palo Alto, California: AAAI Press, pp. 226– 231. 40. Cormen T. H., Leiserson C. E., Rivest R. L. & Stein C. ( 2009) Introduction to Algorithms , 3rd edn. Cambridge, Massachusetts: MIT Press. 41. Butz S. D. ( 2008) Science of Earth Systems , 2nd edn. Boston, Massachusetts: Cengage Learning. 42. Pauly D. & Palomares M.-L. ( 2005) Fishing down marine food web: it is far more pervasive than we thought. Bull. Mar. Sci. , 76, 197– 211. 43. Penrose R. ( 1955) A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society , vol. 51. Cambridge, UK: Cambridge University Press, pp. 406– 413. Google Scholar CrossRef Search ADS   44. Ulanowicz R. E. & DeAngelis D. L. ( 1999) Network analysis of trophic dynamics in south florida ecosystems. Proceedings of South Florida Restoration Science Forum . Reston, Virginia: US Geological Survey, pp. 114– 115. 45. Luckie M., Huffaker B., Dhamdhere A., Giotsas V. & Claffy K. C. C. ( 2013) As relationships, customer cones, and validation. Proceedings of the 2013 conference on Internet measurement conference . New York, NY: ACM, pp. 243– 256. 46. Winther M. ( 2006) Tier 1 isps: What they are and why they are important. IDC White Paper, NTT Communications . 47. Homans G. C. ( 1951) The Human Group , vol. 7, 1st edn. Abingdon, UK: Routledge & Kegan Paul Ltd. 48. Press W. H., Teukolsky S. A., Vetterling W. T. & Flannery B. P. ( 2007) Numerical Recipes 3rd Edition: The Art of Scientific Computing , 3rd edn. Cambridge, UK: Cambridge University Press. 49. Weyl H. ( 1912) Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann. , 71, 441– 479. Google Scholar CrossRef Search ADS   © The authors 2018. Published by Oxford University Press. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Complex Networks Oxford University Press

# Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs

, Volume Advance Article – May 7, 2018
53 pages

/lp/ou_press/spectral-clustering-algorithms-for-the-detection-of-clusters-in-block-KSixX0ApYb
Publisher
Oxford University Press
Abstract We propose two spectral algorithms for partitioning nodes in directed graphs respectively with a cyclic and an acyclic pattern of connection between groups of nodes, referred to as blocks. Our methods are based on the computation of extremal eigenvalues of the transition matrix associated to the directed graph. The two algorithms outperform state-of-the-art methods for the detection of node clusters in synthetic block-cyclic or block-acyclic graphs, including methods based on blockmodels, bibliometric symmetrization and random walks. In particular, we demonstrate the ability of our algorithms to focus on the cyclic or the acyclic patterns of connection in directed graphs, even in the presence of edges that perturb these patterns. Our algorithms have the same space complexity as classical spectral clustering algorithms for undirected graphs and their time complexity is also linear in the number of edges in the graph. One of our methods is applied to a trophic network based on predator–prey relationships. It successfully extracts common categories of preys and predators encountered in food chains. The same method is also applied to highlight the hierarchical structure of a worldwide network of autonomous systems depicting business agreements between Internet Service Providers. 1. Introduction The past years have witnessed the emergence of large networks in various disciplines including social science, biology and neuroscience. These networks model pairwise relationships between entities such as predator–prey relationships in trophic networks, friendship in social networks, etc. These structures are usually represented as graphs where pairwise relationships are encoded as edges connecting vertices in the graph. When the relationships between entities are not bidirectional, the resulting graph is directed. Some directed networks in real-world applications have a block-acyclic structure: nodes can be partitioned into groups of nodes such that the connections between groups form an acyclic pattern as depicted in Fig. 1a. Such patterns are encountered in networks that tend to have a hierarchical structure such as trophic networks modelling predator–prey relationships [1] or networks of autonomous systems where edges denote money transfers between Internet Service Providers [2]. On the other hand, one may encounter directed graphs with a block-cyclic structure (Fig. 1b) when the network models a cyclic phenomenon such as the carbon cycle [3]. These two patterns are intimately related as the removal of a few edges from a block-cyclic graph makes it block-acyclic. This relationship is also observed in real-world networks: a graph of predator-prey interactions can be viewed as an acyclic version of the carbon cycle. In this article, we take advantage of this connection between the two types of patterns and formulate two closely related algorithms for the detection of groups of nodes respectively in block-acyclic and block-cyclic graphs in the presence of slight perturbations in the form of edges that do not follow the block-cyclic or the block-acyclic pattern of connections in the graph. Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). Fig. 1. View largeDownload slide (a) Block-acyclic graph and (b) block-cyclic graph. Labels of blocks in the block-acyclic graph denote the ranking of blocks (topological order of blocks in the block-acyclic graph). The partitioning of nodes in block-acyclic and block-cyclic networks can be viewed as a clustering problem. In graph mining, clustering refers to the task of grouping nodes that are similar in some sense. The resulting groups are called clusters. In the case of directed graphs, the definition of similarity between two nodes may take the directionality of edges incident to these nodes into account. Clustering algorithms taking the directionality of edges into account may be referred to as pattern-based clustering algorithms which extract pattern-based clusters [4]: such methods produce a result in which nodes within the same cluster have similar connections with other clusters. Groups of nodes in block-acyclic and block-cyclic graphs are examples of pattern-based clusters. Several approaches were proposed for the detection of pattern-based clusters in directed graphs [4]. Popular families of methods for the detection of pattern-based clusters are random walk based algorithms, blockmodels and more specifically stochastic blockmodels and bibliometric symmetrization. Random walk based models are usually meant to detect density-based clusters [5], however by defining a two step random walk as suggested in [6] pattern-based clusters such as blocks in block-cyclic graphs can also be detected. But, the success of this method is guaranteed only when the graph is strongly connected and the result is hazardous when the graph is sparse, with a high number of nodes with zero inner or outer degree. Models based on a blockmodelling approach [7] are based on the definition of an image graph representing connections between blocks of nodes in a graph and the block membership is selected so that the corresponding image graph is consistent with the edges of the original graph. However, in existing algorithms the optimization process relies, for instance, on simulated annealing, hence the computational cost is high and there is a risk of falling into a local optimum. Moreover, this method may also fail when the graph is sparse. Clustering algorithms based on stochastic blockmodels detect clusters of nodes that are stochastically equivalent. In particular the method proposed in [8] estimates the block membership of nodes by defining a vertex embedding based on the extraction of singular vectors of the adjacency matrix which turns to be efficient compared to the common methods based on expectation maximization. However, the assumption of stochastic equivalence implies that the degrees of nodes within clusters exhibit a certain regularity as shown further. Hence, this approach may yield poor results in detecting clusters in real-world block-cyclic and block-acyclic networks. A related category of method is bibliometric symmetrization, which defines a node similarity matrix as a weighted sum between the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$ [9] where $$W$$ is the adjacency matrix of the graph. However, it may also fail when the degrees of nodes are not sufficiently regular within groups. To relax this assumption, degree corrected versions with variables representing the degrees of nodes were proposed [10, 11]. But fitting these models relies on costly methods that do not eliminate the risk of falling into a local optimum (simulated annealing, local heuristics, etc.) [12]. Hence methods based on random walks, bibliometric symmetrization, blockmodels with or without degree correction, may yield poor results in the detection of blocks of nodes in block-cyclic and block-acyclic graphs due to assumptions of connectivity or regularity or due to the computational difficulty of solving the associated optimization problems. The methods described in this article partly alleviate these weaknesses. In this article, we present two new clustering algorithms that extract clusters in block-cyclic and block-acyclic graphs in the presence of perturbing edges. The first algorithm, called Block-Cyclic Spectral (BCS) clustering algorithm is designed for the detection of clusters in block-cyclic graphs. The second algorithm, referred to as Block-Acyclic Spectral (BAS) clustering algorithm is a slight extension of the first one that is able to detect clusters in block-acyclic graphs. Two types of perturbation are considered in our study: in the first case, the perturbing edges are uniformly distributed across the graph, while, in the second case, the perturbation is not uniform with some groups of nodes experiencing a higher volume of perturbing edges. A theoretical analysis of the effect of perturbing edges provides a sufficient condition on the perturbation for the success of our algorithms: in particular, when the perturbation is uniform, the condition specifies a maximum number of perturbing edges that can be tolerated by our BCS clustering algorithm to recover the blocks of a block-cyclic graph. Moreover, experimental results show that, even when the perturbation is not uniform, our algorithms are also successful and outperform other approaches. In particular, experiments on synthetic unweighted graphs show that our BCS clustering algorithm is able to recover the blocks perfectly even when appending $$\mathscr{O}(|E|)$$ perturbing edges to a block-cyclic graph containing $$|E|$$ edges.1 We apply the second algorithm to two real-world datasets: a trophic network in which the traditional classification of agents in an ecosystem is detected, from producers to top-level predators, and a worldwide network of autonomous systems depicting money transfers between Internet Service Providers. When tested on synthetic datasets, our algorithms produce smaller clustering errors than other state-of-the-art algorithms. Moreover, our methods only involve standard tools of linear algebra which makes them efficient in terms of time and space complexity. Simple heuristics are also provided in order to automatically determine the number of blocks in block-cyclic and block-acyclic graphs. Hence the approach, we follow differs from other clustering methods for directed graphs: we restrict ourselves to two patterns of connection (cyclic and acyclic) but we make no assumption of regularity (for instance on the degrees of nodes). Moreover, our BCS and BAS clustering algorithms are able to focus on the cyclic or the acyclic patterns of connection respectively in block-cyclic or block-acyclic graphs, while neglecting any other patter present in the graph. Our proposed algorithms are based on the computation of extremal eigenvalues and eigenvectors of a non-symmetric graph-related matrix, commonly called the transition matrix$$P$$. The proposed approaches amount to finding a set of right eigenvectors of $$P$$ verifying   $$Pu=\lambda u\text{ for }\lambda\in\mathbb{C}\text{, }|\lambda|\simeq 1\text{, }\lambda\neq 1$$ (1.1) and clustering the entries of these eigenvectors to recover the blocks of nodes. Hence, the process is similar to the well-known Spectral Clustering algorithm for the detection of clusters in undirected graphs, which is also based on the computation of extremal eigenvalues and eigenvectors of a graph-related matrix [13]. However, spectral clustering and extensions of spectral clustering to directed graphs are essentially based on the real spectrum of symmetric matrices associated to the graph [8, 14, 15]. In contrast, our method is based on the complex spectrum of a non-symmetric matrix. Hence, it keeps the intrinsically asymmetric information contained in directed graphs while having approximately the same time and space complexity as other spectral clustering algorithms. A paper recently appeared [16] that exploits spectral information (in a different way than in the present paper) for solving a related problem, the detection of block-cyclic components in the communities of a graph, with a special focus on the three-block case. In contrast, we focus on networks with a global block-cyclic structure and extend our method for the detection of acyclic structures, which we deem even more relevant than block-cyclicity in practical situations. Part of the results presented here were derived in an unpublished technical report [17]. This article offers more empirical validation and comparison with state-of-the-art competing techniques. The structure of this article is as follows. In Section 2, we describe related clustering methods for directed graphs. In Section 3, we present our BCS clustering algorithm for the detection of clusters in block-cyclic graphs. Then we describe the links between block-cyclic and block-acyclic graphs in Section 4. In Section 5, BAS clustering algorithm is introduced for the detection of clusters in block-acyclic graphs. In Section 6, we analyse the performances of BCS and BAS clustering algorithms on synthetic data. Finally, in Section 7, we apply BAS clustering algorithm to a real-world trophic network and a network of autonomous systems. 2. Related work In this section, we present existing algorithms related to our work, including the classical spectral clustering algorithm and some existing algorithms for clustering directed graphs. In subsequent sections, we refer to a weighted directed graph as a triplet $$G=(V,E,W)$$ where $$V$$ is a set of nodes, $$E\subseteq V\times V$$ is a set of directed edges and $$W\in\mathbb{R}^{n\times n}_+$$ is a matrix of positive edge weights such that $$W_{uv}$$ is the weight of edge $$(u,v)$$ and for each $$u,v\in V$$,   $$W_{uv}>0\leftrightarrow (u,v)\in E.$$ (2.1) When the graph is unweighted, we refer to it as a pair $$G=(V,E)$$ and the binary adjacency matrix is kept implicit. Moreover, when the graph is undirected, we have $$W=W^T$$. Finally, we refer to a right (resp. left) eigenvector of a matrix $$A\in\mathbb{C}^{n\times n}$$ associated to an eigenvalue $$\lambda\in\mathbb{C}$$ as a vector $$u\in\mathbb{C}^n\setminus\{0\}$$ such that $$Au=\lambda u$$ (resp. $$u^HA=\lambda u^H$$). 2.1 Spectral clustering of undirected graphs Spectral clustering uses eigenvalues and eigenvectors of a graph-related matrix (the Laplacian) to detect density-based clusters of nodes in an undirected graph, namely clusters with a high number of intra-cluster edges and a low number of inter-cluster edges [13]. The method can be decomposed into two steps. First, the nodes of the graph are mapped to points in a Euclidean space such that nodes that are likely to lie in the same cluster are mapped to points that are close to each other in this projection space. The second step of the method involves clustering the $$n$$ points in $$\mathbb{R}^k$$ using k-means algorithm. The algorithm is based on spectral properties of the graph Laplacian $$L\in\mathbb{R}^{n\times n}$$ defined by   $$L_{uv}=\left\lbrace\begin{array}{ll} 1-\frac{W_{uu}}{d_u}&\text{ if }u=v\text{ and }d_u\neq 0\\ -\frac{W_{uv}}{\sqrt{d_ud_v}}&\text{ if }u\text{ and }v\text{ are adjacent}\\ 0&\text{ otherwise} \end{array}\right.$$ (2.2) where $$W$$ is the adjacency matrix of the graph and $$d_u$$ is the degree of node $$u$$. If the target number of clusters is $$k$$, extracting the right eigenvectors associated to the $$k$$ smallest eigenvalues of the Laplacian and storing them as the columns of a matrix $$U\in\mathbb{R}^{n\times k}$$, the embeddings of nodes are given by the $$n$$ rows of $$U$$. One can justify this method in the following way. When clusters are disconnected, namely when the graph contains $$k$$ connected components, the rows of $$U$$ associated to nodes belonging to the same component are identical [13]. Hence, when perturbing this disconnected graph, namely in the presence of clusters with a low number of inter-cluster edges, nodes within the same clusters are mapped to points that tend to form clusters in $$\mathbb{R}^k$$. This is explained by the semi-simplicity of eigenvalue $$0$$ of the graph Laplacian which implies the continuous dependence of associated right eigenvectors on the weights of edges in the graph [13]. In Sections 3 and 5, we show that a similar approach can be used to extract clusters in directed graphs with a cyclic or an acyclic pattern of connection between clusters: we also use the spectrum of a graph-related matrix to map nodes to points in a Euclidean space and cluster these points with k-means algorithm. 2.2 Clustering algorithms for directed graphs In this section, we describe existing clustering algorithms for the detection of pattern-based clusters in directed graphs, namely groups of nodes with similar connections to other groups in some sense. We focus on methods that are theoretically able to extract blocks from block-cyclic and block-acyclic graphs. Bibliometric symmetrization refers to a symmetrization of the adjacency matrix $$W$$ of $$G$$. The symmetrized matrix $$(1-\alpha)W^TW+\alpha WW^T$$ is defined as the adjacency matrix of an undirected graph $$G_u$$ for a certain choice of weighing parameter $$\alpha$$. This symmetric adjacency matrix is a linear combination of the co-coupling matrix $$WW^T$$ and the co-citation matrix $$W^TW$$. Then clustering methods for undirected graphs are applied to $$G_u$$ such as a spectral clustering algorithm. This method is efficient to detect co-citation networks [9]. The primary focus of random walk based clustering algorithms is the detection of density-based clusters [5], namely with a high number of intra-cluster edges and a low-number of inter-cluster edges. A symmetric Laplacian matrix for directed graphs based on the stationary probability distribution of a random walk is defined and applying classical spectral clustering algorithm to this Laplacian matrix leads to the extraction of clusters in which a random walker is likely to be trapped. To detect pattern-based clusters, an extension of this method was proposed in which a random walker alternatively moves forward following the directionality of edges, and backwards, in the opposite direction [6]. This method successfully extracts clusters in citation-based networks. Similarly, another random walk-based approach extends the definition of directed modularity to extract clusters of densely connected nodes with a cyclic pattern of connections between clusters [18, 19]. The blockmodelling approach is based on the extraction of functional classes from networks [7]. Each class corresponds to a node in an image graph, which describes the functional roles of classes of nodes and the overall pattern of connections between classes of nodes. A measure of how well a given directed graph fits to an image graph is proposed. The optimal partitioning of nodes and image graph are obtained by maximizing this quality measure using alternating optimization combined with simulated annealing. Methods based on stochastic blockmodels were first defined for undirected networks and then exten-ded to directed graphs in [20]. A stochastic blockmodel is a model of random graph. For a number $$k$$ of blocks the parameters of a stochastic blockmodel are a vector of probabilities $$\gamma\in\{0,1\}^k$$ and a matrix $$\Phi\in \{0,1\}^{k\times k}$$. Each node is randomly assigned to a block with probabilities specified by $$\gamma$$ and the probability of having an edge $$(i,j)$$ for $$i$$ in block $$s$$ and $$j$$ in block $$t$$ is $$\Phi_{st}$$. For this reason, nodes within a block are said to be stochastically equivalent. One of the various methods for the detection of blocks in graphs generated by a stochastic blockmodel is based on the extraction of singular vectors of the adjacency matrix [8], which is similar to the bibliometric symmetrization combined with the classical spectral clustering algorithm. The common definition of the stochastic blockmodel implies that in- and out-degrees of nodes within blocks follow a Poisson binomial distribution [10, 21] and have thus the same expected value. As this assumption is not verified in most real-world directed networks, [12] proposed a degree-corrected stochastic blockmodel for directed graphs where additional variables are introduced allowing more flexibility in the distribution of degrees of nodes within blocks. The partitioning of nodes is obtained by an expectation maximization process. Other statistical methods exist among which the so-called clustering algorithm for content-based networks [11]. This method is similar to stochastic blockmodelling but instead of block-to-block probabilities of connection, it is based on node-to-block and block-to-node probabilities. The model parameters are adjusted through an expectation maximization algorithm. This approach can be viewed as another degree-corrected stochastic blockmodel and hence it is robust to high variations in the degrees of nodes but it also involves a more complex optimization approach due to the higher number of parameters. Finally, some methods are based on the detection of roles in directed networks such as in [22] which defines the number of paths of given lengths starting or ending in a node as its features from which node similarities are extracted. As we will see, our definition of block-cyclic and block-acyclic graph does not include any constraint on the regularity of node features such as the number of incoming or outgoing paths. We are interested in the detection of clusters in block-cyclic and block-acyclic graphs. Apart from the role model, the methods described above are all theoretically able to extract such clusters. Methods based on bibliometric symmetrization and stochastic blockmodels are able to detect such structures whenever the assumption of stochastic equivalence between nodes within blocks is verified. Provided that the graph is strongly connected, the method based on two-step random walk can also be used. If degrees of nodes are large enough, the blockmodelling approach is also successful. However, the benchmark tests presented in Section 6 show that our algorithms outperform all these methods in the presence of high perturbations or when these assumptions are not fulfilled. 3. Spectral clustering algorithm for block-cycles In this section, we describe a method for the extraction of blocks of nodes in block-cyclic graphs (or block-cycles). We recall that a block-cycle is a directed graph where nodes can be partitioned into non-empty blocks with a cyclic pattern of connections between blocks. We provide a formal definition of block-cycle below. Definition 3.1 (Block-cycle) A directed graph $$G=(V,E,W)$$ is a block-cycle of $$k$$ blocks if it contains at least one directed cycle of length $$k$$ and if there exists a function $$\tau:V\rightarrow\{1,\ldots,k\}$$ partitioning the nodes of $$V$$ into $$k$$ non-empty subsets, such that   $$E\subseteq\{(u,v)\text{ : }(\tau(u),\tau(v))\in \mathscr{C}\}$$ (3.1) where $$\mathscr{C}=\{(1,2),(2,3),\ldots,(k-1,k),(k,1)\}$$. Due to the equivalence between the existence of clusters in a graph and the block structure of the adjacency matrix, we use the terms ‘cluster’ and ‘block’ interchangeably. We also use the terms ‘block-cycle’ and ‘block-cyclic graph’ interchangeably. Figure 1b displays an example of block-cycle. The blocks may contain any number of nodes (other than zero) and there can be any number of edges connecting a pair of consecutive blocks in the cycle. The definition implies that any block-cycle is k-partite [23]. However, the converse is not true as the definition of block-cycle includes an additional constraint on the directionality of edges. It is worth mentioning that, in the general case, a given block-cycle is unlikely to derive from a stochastic blockmodel; in which nodes within a block are stochastically equivalent [8]. Indeed, as mentioned before, stochastic equivalence implies that degrees of nodes within the same block are identically distributed. Our definition does not include such regularity assumption. In the remainder of this section, we first analyse spectral properties of block-cycles. Then we formulate our spectral clustering algorithm for the detection of blocks in block-cycles. 3.1 Spectral properties of block-cycles Up to a permutation of blocks, the adjacency matrix of a block-cycle is a block circulant matrix with nonzero blocks in the upper diagonal and in the bottom-left corner as depicted in Fig. 2a. Given a perturbed block-cycle, our goal is to recover the partitioning of nodes into blocks, namely to provide an estimation $$\hat{\tau}$$ of $$\tau$$. We assume that a perturbed block-cycle is obtained by randomly appending edges to a block-cycle. Different distributions of the perturbing edges are considered: in particular, when the perturbing edges are uniformly distributed across the graph, a sufficient condition is provided, giving the maximum allowed number of perturbing edges that guarantees the success of our method. Experiments described in Section 6 also demonstrate the effectiveness of our method when the perturbation is not uniform (i.e. when the perturbing edges are preferentially targeting some specific nodes). Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. Fig. 2. View largeDownload slide Adjacency matrix (a) and complex spectrum of the transition matrix (b) of a block-cycle of $$8$$ blocks. To detect blocks in a block-cycle, we use the spectrum of a graph-related matrix, the transition matrix $$P\in\mathbb{R}^{n\times n}$$ associated to the Markov chain based on the graph.   $$P_{ij}=\left\lbrace\begin{array}{ll} \frac{W_{ij}}{d_i^{out}}&\text{ if }d_i^{out}\neq 0\\ 0&\text{ otherwise} \end{array}\right.$$ (3.2) where $$W$$ is the weight matrix and $$d_i^{out}=\sum_jW_{ij}$$ is the out-degree of node $$i$$. The transition matrix is row stochastic, namely $$P\mathbf{1}=\mathbf{1}$$ where $$\mathbf{1}$$ represent the vector of ones. The basic spectral property of the transition matrix is that all its complex eigenvalues lie in the ball $$\{x\in\mathbb{C}\text{ : }\Vert x\Vert_2\leq 1\}$$ regardless of the associated graph [24]. This property combined with the fact that the transition matrix of a block-cycle is block circulant [25] makes it possible to prove the following lemma2. We make the assumption that $$d_i^{out}>0$$ for any node $$i$$. Lemma 3.1 Let $$G=(V,E,W)$$ be a block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$. Then $$\lambda_l=e^{-2\pi i\frac{l}{k}}\in spec(P)$$ for all $$0\leq l\leq k-1$$, namely there are $$k$$ eigenvalues located on a circle centered at the origin and with radius $$1$$ in the complex plane. The right eigenvector associated to the eigenvalue $$e^{-2\pi i\frac{l}{k}}$$ is   $$u^l_j=\left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}e^{2\pi i\frac{lk}{k}} & j\in V_1\\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l(k-1)}{k}} & j\in V_2\\ \vdots & \\ \frac{1}{\sqrt{n}}e^{2\pi i\frac{l}{k}} & j\in V_k \end{array}.\right.$$ (3.3) for which $$Pu^l=\lambda_lu^l$$ and $$\Vert u_l\Vert_2=1$$. Moreover, if $$G$$ is strongly connected, then the eigenvalues $$\lambda_0,\ldots,\lambda_{k-1}$$ have multiplicity $$1$$ and all other eigenvalues of $$P$$ have a modulus strictly lower than $$1$$. We refer to these eigenvalues and eigenvectors as the cycle eigenvalues and the right cycle eigenvectors. This spectral property is illustrated in Fig. 2b displaying the eigenvalues of the transition matrix of a block-cycle of eight blocks. Hence, computing the cycle eigenvalues and storing the corresponding right eigenvectors of $$P$$ as the columns of a matrix $$U\in\mathbb{C}^{n\times k}$$, the rows $$\{x^j\text{ : }1\leq j\leq n\}$$ define vector representations of the nodes of $$G$$. To recover the $$k$$ blocks of the block-cycle, we observe that the embeddings $$\{x^j\text{ : }j\in V^s\}$$ of the nodes in block $$V^s$$ are identical, being all equal to   $$c^s=\frac{1}{\sqrt{n}}\left[1,e^{\frac{2\pi i}{k}(k-s+1)},e^{\frac{4\pi i}{k}(k-s+1)},\ldots,e^{\frac{2\pi i}{k}(k-1)(k-s+1)}\right]\!.$$ (3.4) We refer to the set of vectors $$\{c^s\text{ : }0\leq s\leq k-1\}$$ as block centroids of the block-cycle. In the presence of perturbing edges, the vectors $$\{x^j\text{ : } j\in V^s\}$$ are no longer identical but they form a cluster around the block centroid $$c^s$$. Hence, a way to recover the blocks of a perturbed block-cycle is to cluster the vectors $$\{x^j\text{ : }1\leq j\leq n\}$$ by assigning each node $$j$$ to the nearest block centroid, namely   $$\hat{\tau}(j)\leftarrow \underset{s}{\text{argmin}}\Vert x^j-c^s\Vert_2$$ (3.5) where $$\hat{\tau}(j)$$ is the estimated block assignment of node $$j$$. Theorem 3.2 justifies the approach by quantifying the effect of additive perturbations on the spectrum of the transition matrix of a block-cycle: starting with an unperturbed block-cycle $$G$$, we consider a graph $$\tilde{G}$$ obtained by appending perturbing edges to $$G$$ and we provide upper bounds on the perturbation of cycle eigenvalues, right cycle eigenvectors and node embeddings $$\{x^j\text{ : }1\leq j\leq n\}$$. These bounds are further used to provide a sufficient condition on the perturbation (Equation 3.12, further) for the method to succeed in recovering the blocks of nodes3. Theorem 3.2 Let $$G=(V,E,W)$$ be a strongly connected block-cycle with $$k$$ blocks $$V_1,\ldots,V_k$$ such that $$d_i^{out}>0$$ for all $$i\in V$$, let $$\lambda_0,\ldots,\lambda_{k-1}$$ be the $$k$$ cycle eigenvalues and $$u^0,\ldots,u^{k-1}$$ be the corresponding right cycle eigenvectors. Let the $$\hat{G}=(V,\hat{E},\hat{W})$$ be a perturbed version of $$G$$ formed by appending positively weighted edges to $$G$$ except self-loops. Let $$P$$ and $$\hat{P}$$ denote the transition matrices of $$G$$ and $$\hat{G}$$ respectively. We define the quantities   $$\begin{array}{rcl} \sigma &=&\underset{(i,j)\in\hat{E}}{\max}\text{ }\frac{\hat{d}_j^{in}}{\hat{d}_i^{out}}\\ \rho &=&\underset{i}{\max}\text{ }\frac{\hat{d}_i^{out}-d_i^{out}}{d_i^{out}} \end{array}$$ (3.6) where $$d^{in}_i$$, $$d^{out}_i$$, $$\hat{d}^{in}_i$$ and $$\hat{d}^{out}_i$$ represent the in-degree and out-degree of $$i$$-th node in $$G$$ and $$\hat{G}$$, respectively. Then, 1. for any cycle eigenvalue $$\lambda_l\in spec(P)$$, there exists an eigenvalue $$\hat{\lambda}_l\in spec(\hat{P})$$ so that   $$\left\vert\hat{\lambda}_l-\lambda_l\right\vert \leq \sqrt{2n}\Vert f\Vert_2\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (3.7) where $$f$$ is the (left) Perron eigenvector of $$P$$, namely the vector $$f$$ verifying $$f^TP=f^T$$ with $$f^T\mathbf{1}=1$$, 2. there exists a right eigenvector $$\hat{u}^l$$ of $$\hat{P}$$ associated to eigenvalue $$\hat{\lambda}^l$$ (i.e. for which $$\hat{P}\hat{u}^l=\hat{\lambda}^l\hat{u}^l$$) verifying   $$\Vert\hat{u}^l-u^l\Vert_2\leq \sqrt{2}\Vert (\lambda^lI-P)^{\#}\Vert_2 \sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}+\mathscr{O}\left(\sigma\rho\right)$$ (3.8) where $$u^l$$ is the right eigenvector of $$P$$ associated to eigenvalue $$\lambda^l$$ and $$(\lambda^lI-P)^{\#}$$ denotes the Drazin generalized inverse of $$(\lambda^lI-P)$$, 3. the node embeddings $$\{x^1,\ldots,x^n\}\subset\mathbb{C}^k$$ and $$\{\hat{x}^1,\ldots,\hat{x}^n\}\subset\mathbb{C}^k$$ defined as the rows of the right eigenvector matrices $$U=[u^1,\ldots,u^k]\in\mathbb{C}^{n\times k}$$ and $$\hat{U}=[\hat{u}^1,\ldots,\hat{u}^k]\in\mathbb{C}^{n\times k}$$, respectively, verify   $$\Vert x^j - \hat{x}^j\Vert_2 \leq k\sqrt{2\sigma\rho}\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2+\mathscr{O}\left(\sigma\rho\right)$$ (3.9) for each $$j\in\{1,\ldots,n\}$$. In addition to Theorem 3.2, Lemma 3.2 shows that the block centroids of a block-cycle are equidistant4, with a pairwise Euclidean distance equal to $$\sqrt{2k/n}$$. Lemma 3.2 The $$k$$ distinct row vectors $$c^0,\ldots,c^{k-1}$$ that constitute the set of the rows of the eigenvector matrix $$U$$ in the case of an unperturbed block-cycle (defined by Equation 3.4) are pairwise equidistant and, for all $$r\neq s\in\{0,\ldots,k-1\}$$,   $$\Vert c^r-c^s\Vert_2=\sqrt{\frac{2k}{n}}$$ (3.10) Lemma 3.2 combined with the third claim of Theorem 3.2 provides a sufficient condition for the success of the method expressed by Equation 3.5 that assigns each node to the block with nearest associated centroid. Indeed, to ensure that each node vector $$\hat{x}^j$$ is located closer to its block centroid $$c^{\tau(j)}$$ than to other block centroids, and if we neglect higher order terms in Equation 3.9, a sufficient condition is to ensure that the distance $$\Vert x^j-\hat{x}^j\Vert_2$$ does not exceed half of the pairwise distance $$\sqrt{2k/n}$$ between the centroids, namely   $$k\sqrt{2\sigma\rho} \underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2\leq \frac{1}{2}\sqrt{\frac{2k}{n}}$$ (3.11) or equivalently   $$\rho\leq \frac{1}{4kn\sigma}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}.$$ (3.12) If the relative perturbation $$\rho$$ on the out-degrees does not exceed the bound expressed by the right-hand side of the inequality, the success of the clustering method expressed by Equation 3.5 is guaranteed. We make a few comments about the different quantities involved in the perturbation bounds of Theorem 3.2 and the sufficient condition expressed by Equation 3.12. The quantities defined as $$\sigma$$ and $$\rho$$ both depend on the presence of perturbing edges. The $$\sigma$$ quantity measures the discrepancy between in-degree of destination and out-degree of origin for all edges in the perturbed graph and hence it is greater than $$1$$. It is close to $$1$$ in the particular case of a perturbed block-cycle with a uniform perturbation5 and if the block-cycle has homogeneous degrees6. The quantity defined as $$\rho$$ measures the relative discrepancy between out-degrees in the presence and in the absence of perturbation. In the particular case of a block-cycle with homogeneous out-degrees approximately equal to $$d^{out}$$, and a uniform perturbation of $$\alpha |E|$$ edges in total (for some $$\alpha>0$$), corresponding to $$\alpha d^{out}$$ perturbing out-edges for each node, we have $$\rho\simeq \alpha$$. In that particular case, Equation 3.12 states that the method is successful for a relative perturbation magnitude $$\alpha$$ satisfying   $$\frac{|\tilde{E}|}{|E|}=\alpha\leq \frac{1}{4kn}\left(\underset{l=0,\ldots,k-1}{\max}\left\Vert (\lambda_lI-P)^\#\right\Vert_2^2\right)^{-1}.$$ (3.13) where $$\tilde{E}$$ represents the perturbing edges and $$E$$ represents the edges in the original block-cycle. In the first claim of Theorem 3.2 regarding the perturbation of the cycle eigenvalues, $$f$$ is the Perron eigenvector [26] with unit $$1$$-norm (i.e. the real valued vector with nonnegative entries and verifying $$f^TP=f^T$$ and $$f^T\mathbf{1}=1$$). Thus $$\frac{1}{\sqrt{n}}\leq \Vert f\Vert_2\leq 1$$ and $$\Vert f\Vert_2=\frac{1}{\sqrt{n}}$$ when it is constant, namely when the stationary probability of the random walk associated to $$P$$ is uniform over the vertices. This is the case for instance for a block-cycle generated by a stochastic blockmodel with similar probabilities of transitions between any pair of consecutive blocks in the block-cycle. Regarding the perturbation of right eigenvectors and node embeddings (second and third claims of Theorem 3.2), Inequality 3.8 follows from the fact that the cycle eigenvalues are simple. Providing bounds on the norm of the Drazin generalized inverse of a non-symmetric matrix is tedious since it depends on the condition number of each eigenvalue of the matrix (see for instance [27] for bounds on the norm of $$(I-P)^{\#}$$ for a stochastic matrix $$P$$). However, based on [28], we show in Appendix A.2 that, when $$P$$ is diagonalizable, the norm of the Drazin generalized inverse of $$(\lambda_lI-P)$$ verifies   $$\Vert (\lambda_lI-P)^{\#}\Vert_2\leq (n-1)\left(\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|\right)^{-1} \Vert Y\Vert_2$$ (3.14) where $$\{u^r\text{ : }1\leq r\leq n\}$$ and $$\{y^r\text{ : }1\leq r\leq n\}$$ represent the right and left eigenvectors7 of $$P$$ such that $$\Vert u^1\Vert_2=...=\Vert u^n\Vert_2=1$$ and $$(y^1)^Hu^1=...=(y^n)^Hu^n=1$$, and $$Y=[y^1,...,y^{l-1},y^{l+1},...,y^n]$$ (concatenation of left eigenvectors). For the particular case of a block-cycle, whenever the perturbation is low8 and the number of blocks is greater than $$7$$, we show in Appendix A.2 that the eigenvalue closest to a cycle eigenvalue is the closest other cycle eigenvalue, hence   $$\underset{\lambda\in \text{spec}(P)\setminus\{\lambda_l\}}{\min}|\lambda-\lambda_l|=2\sin\left(\frac{\pi}{k}\right)\!.$$ (3.15) Moreover the empirical observations described in Appendix A.2 show that   $$\Vert Y\Vert_2\lessapprox (n-1)\sqrt{n}\Vert f\Vert_2.$$ (3.16) Combining the results of Equations 3.14, 3.15 and 3.16, we conclude the following. The perturbation bounds expressed by Equations 3.7, 3.8 and 3.9 in Theorem 3.2 are small for low values of $$k$$ and for low values of $$\sigma$$, $$\rho$$ and $$\Vert f\Vert_2$$. For instance, this is typically the case when the perturbing edges are uniformly distributed across the graph and when the block-cycles consist of balanced blocks with similar numbers of edges among the consecutive blocks of the block-cycle. Similarly, the sufficient condition bounding the maximum allowed perturbation on the out-degrees of the nodes (inequality 3.12) is looser when $$\sigma$$ and $$\Vert f\Vert_2$$ are low. For a uniformly perturbed block-cycle with balanced blocks and connections, the sufficient condition becomes inequality 3.13 which restricts the allowed number of perturbing edges. In practice, experiments on synthetic graphs described in Section 6, show that the method is also able to identify the blocks when the perturbation is structured and not uniform (e.g. generated by a stochastic block-model with non-uniform probabilities of connection) and when the degrees of the nodes of the block-cycle are not homogeneous. In general, Equation 3.8 implies that the right cycle eigenvectors of a block-cycle vary continuously as functions of the entries of the transition matrix and hence of the edges’ weights. Although the bounds provided by Theorem 3.2 can be quite loose in practice, the continuity of the cycle eigenvalues and the right cycle eigenvectors is verified for any strongly connected block-cycle. This continuity property provides the theoretical foundation of our spectral clustering algorithm. It is worth m