Add Journal to My Library
Journal of Complex Networks
, Volume 6 (1) – Feb 1, 2018

30 pages

/lp/ou_press/a-test-on-the-l-moments-of-the-degree-distribution-of-a-barab-si-logr7gmV03

- Publisher
- Oxford University Press
- Copyright
- Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
- ISSN
- 2051-1310
- eISSN
- 2051-1329
- D.O.I.
- 10.1093/comnet/cnx020
- Publisher site
- See Article on Publisher Site

Abstract This research creates a test of hypothesis to detect subtle degradation within a Barabási–Albert network attributable to nodal or edge degradation. For this purpose, degradation is defined as the removal of the nodes or edges within the network that changes the structure of the network and subsequently its degree distribution. To achieve the study’s objective, L-moments from the degree distribution of the Barabási–Albert network are used to create a multivariate test of hypothesis. This is then followed by an analysis of the sensitivity of the test to changes within the network based on proportion of edge and nodal deletion to investigate how quickly the test detects a degrading Barabási–Albert network. Results from the sensitivity analysis show that the multivariate test using L-scale, L-skewness and L-kurtosis is able to detect degradation at a faster rate in comparison to a univariate test alone. 1. Introduction In many applications of dynamic network analysis, it is essential to monitor a network once it is characterized. Being able to detect network degradation with respect to connectivity is necessary for monitoring the health of a network which might experience subtle perturbations. For example, in the context of a computer network, connections between servers may be lost on occasion. Such losses, especially for critical connections, need to be detected as early as possible. The study of the effect of network degradation, or specifically, network attack, on network function is not new and has gained renewed interest in recent years. Early on, Albert et al. demonstrated that the Barabási–Albert network was reasonably robust against random deletion of nodes in large networks, but that communication in the network suffered greatly from attack (removal) of small portions of high degree nodes [1]; a fact supported by others with respect to networks whose degree distribution follows the power law [2]. Degree distribution is not the only network measure examined, as methods of attack using measures such as betweenness and their effects on network characteristics such as connectivity, percolation and clustering, have also been examined [3, 4]. Most recently, vulnerability to attack in the study of interdependent networks demonstrates the profound effects and frailty of these networks to specific modes of attack [5, 6]. Therefore, having a test that can detect the loss of connection (loss of edge) or the loss of a vertex (loss of node) fairly quickly before any substantial damage is done to the network is very useful. Even more so, if a network is experiencing degradation despite continued performance, it is paramount to detect it quickly so as to prevent a catastrophic loss, possibly from subsequent, sequential attacks. Hence, it is necessary that any method for detecting changes within a network have high power and has a high chance of detecting changes occurring within the network. We focus this work on networks that exhibit the scale-free property, a property possessed by various real-world networks and which governs many social, physical and biological phenomena [7–11]. The Barabási–Albert network exhibits the scale-free property and is desired for its simplicity in terms of its mechanics and parameters [12]. Further, we motivate this work through the desire to detect degradation in the network prior to loss of network performance, and the need for sensitive, automated tools that network analysts may utilize for detection. Consequently, once characterized as Barabási–Albert, the real network can be easily simulated via a Barabási–Albert network proxy for characterization and monitoring. Our approach for characterizing and subsequently detecting degradation focuses on probabilistic measures. Although some researchers have looked at the probabilistic aspect of a graph by measuring its entropy [13, 14], such measures do not capture the uniqueness of a given graph. Instead, one way of summarizing a probability distribution is through its set of moments which describe its central location, scale, symmetry, peakedness, as well as other higher order characteristics. The uniqueness of a probability distribution through its moments and in particular, its L-moments, can be guaranteed under certain conditions. Therefore, it is plausible that these characteristics, if applied to network analysis, may provide a sensitive probabilistic approach to detect changes within the network with respect to network measures. The objectives of this research is to create a test of hypothesis to detect subtle degradation in a Barabási–Albert network attributable to nodal or edge deletion. First, L-moments from the empirical degree distribution of the Barabási–Albert network are used for creating a multivariate test of hypothesis. Then, analysis of the sensitivity of the multivariate test to changes within the network based on the proportion of edge and nodal deletion is conducted in order to describe how quickly the test detects degradation within the network. Here, degradation refers to the changes in the structure of the network via its degree distribution and not the network performance which is the definition used in some research fields. Finally, the test is then applied to real world network data prior to discussion and conclusions. 2. Background 2.1 Degree distribution of the Barabási–Albert graph The Barabási–Albert model is based on two mechanisms that govern the scale-free property of real-world networks: (1) networks expand continuously by the addition of new nodes and (2) new nodes attach preferentially to other nodes that are already well connected. The model operates by first starting with an initial number of nodes, $$m_0$$, each having no edges. This is followed by an iterative process of adding a single node with $$m$$ edges where the edges are connected to an existing node $$i$$ with degree $$d_{i}$$ based on the linear preferential attachment probability, $$\pi(d_i)$$, where \begin{align*} \pi(d_{i})=\frac{d_{i}}{\sum_{\forall j}d_{j}} \end{align*} is the probability that node $$i$$ will be attached to the new node. The nodal degree ($$d$$) of the Barabási–Albert scale-free graph can be derived by using the mean field theory [8] where for node $$i$$ with $$m$$ edges, the degree distribution may be expressed as \begin{align*} f_{d_{i}}\left(x\right) &=2m^{2}x^{-3}\frac{t}{n}, \end{align*} where $$t=n-m_0$$ represents the number of iterations required to achieve the desired graph. Thus for finite $$n$$, the degree distribution is written as \begin{equation*} f_{d_{i}}\left(x\right)=2\left(m\sqrt{\left(n-m_{0}\right)/n}\right) ^{2}x^{-3} \end{equation*} which implies that the degree distribution of the Barabási–Albert graph follows a $$Pareto\left(m\sqrt{\left(n-m_{0}\right)/n},2\right)$$ distribution. Note that as the size of the graph increases, $$n\rightarrow \infty$$, the distribution of the degree converges to a $$Pareto\left(m,2\right)$$ distribution. While the Pareto distribution is named in honour of the early works done by Vilfredo Pareto [15], it is also known as a power law distribution. Estimating the parameters of the Pareto distribution can be accomplished by using the Maximum Likelihood Estimate (MLE), and from there, a test of hypothesis on the parameters can be performed. This is the basis of our method for detecting degradation within the Barabási–Albert network. 2.2 L-moments L-moments were first proposed by Hosking [16] as a conglomerate result that was derived from a collection of previous results by Gini [17], Sillitto [18, 19], Downton [20], Chan [21], Konheim [22], Mallows [23], and Greenwood et al. [24]. L-moments are linear combinations of order statistics that describe the location and shape of the probability distribution of a random variable, $$X$$, analogous to classical moments (e.g. the mean ($$E[X]$$), variance ($$Var[X]$$), skewness $$\left(\frac{E[(X-E[X])^3]}{Var[X]^{3/2}}\right)$$, kurtosis $$\left(\frac{E[(X-E[X])^4]}{Var[X]^2}\right)$$, etc). The $$r$$th L-moment is defined as \begin{equation*} \lambda _{r}=\frac{1}{r}\sum_{i=0}^{r-1}\left(-1\right) ^{i}\binom{r-1}{i}E\left[ X_{r-i:r}\right] \end{equation*} where $$X_{j:n}$$ denotes the $$j$$th order statistic ($$j$$th smallest sample value) in an independent sample of size n. Note that $$\lambda _{1}=E[X_{1:1}]=E[X]$$, the classical mean. The $$r$$th L-moment ratio is defined as \begin{equation*} \tau _{r}=\frac{\lambda _{r}}{\lambda _{2}}; r=3,4,... \end{equation*} and is akin to the conventional standardized moment, such as skewness and kurtosis, but has an open bound of $$(-1,1)$$. The 1st and 2nd L-moments are referred to as L-mean and L-scale, respectively, whereas the 3rd and 4th L-moment ratios are referred to as L-skewness and L-kurtosis, respectively. Additionally, the second L-moment is strictly positive, and the fourth L-moment ratio, L-kurtosis, is shown to have a tighter bound of $$\frac{1}{4}(\tau_{3}^{2}-1)\leq \tau_{4} < 1$$ [16]. Hosking [16] states that a set of L-moments is unique to a particular distribution as long as the mean of the distribution exists. Further, Hosking stated that if the L-moments do exist, then the first two L-moments, $$\lambda_{1}$$ and $$\lambda_{2}$$, as well as the third and fourth L-moment ratios, $$\tau_{3}$$ and $$\tau_{4}$$, are enough to summarize the main features of a probability distribution. Additionally, the set of L-moments is considered more robust to outliers than conventional moments [16]. For example, a distribution with one very outlying point will cause the variance to increase quite notably but does not affect the L-scale to the same extent. Estimating L-moments can be achieved by considering the probability weighted moments (PWM) [16] resulting in the L-moment estimate, $$l_r$$, from a random sample, $$(x_1,\ldots,x_n)$$, given by: \begin{equation*} l_{r}=\binom{n}{r} ^{-1} \sum_{1\leq i_{1} < i_{2} < \ldots < i_{r} \leq n} \frac{1}{r} \sum_{j=0}^{r-1} (-1)^{j} \binom{r-1}{j} x_{i_{r-j}:n}, \; r=1,2,\dots,n. \end{equation*} However, direct estimators of the first four L-moments were derived by Wang [25] that circumvent the need for using PWMs. These estimators are defined, respectively, as \begin{gather*} \textstyle\widehat{\lambda }_{1} =\binom{n}{1}^{-1}\sum_{i=1}^{n}x_{\left(i\right) } \\ \textstyle\widehat{\lambda }_{2} =\frac{1}{2}\binom{n}{2}^{-1}\sum_{i=1}^{n}\left(\binom{i-1}{1}-\binom{n-i}{1}\right) x_{\left(i\right) } \\ \textstyle\widehat{\lambda }_{3} =\frac{1}{3}\binom{n}{3}^{-1}\sum_{i=1}^{n}\left(\binom{i-1}{2}-2\binom{i-1}{1}\binom{n-i}{1}+\binom{n-i}{2}\right) x_{\left(i\right) } \\ \textstyle\widehat{\lambda }_{4} =\frac{1}{4}\binom{n}{4}^{-1}\sum_{i=1}^{n}\left(\binom{i-1}{3}-3\binom{i-1}{2}\binom{n-i}{1}+3\binom{i-1}{1}\binom{n-i}{2}-\binom{n-i}{3}\right)x_{\left(i\right)} \end{gather*} where $$x_{(i)}$$ is the $$i$$th order statistic. L-moments can be computed for continuous or discrete random variables. However, expressions for L-moments of common discrete distributions tend to be complicated [16]. Further, since some discrete random variables can be approximated by specific continuous random variables, certain results for continuous variable L-moments are also valid for discrete random variables [26]. As given by Hosking [16, 26], the expected values of L-scale, L-skewness, and L-kurtosis for a non-inclusive selection of distributions are listed in Table 1. As of this writing, no publication on characterizing the distributions of the L-moments have been found in the literature. However, Elamir and Seheult [27] have derived the expressions for the exact variances of the first four sample L-moments as well as their covariances that are distribution free. Table 1 L-scale, L-skewness and L-kurtosis of well known distributions as derived by Hosking [26] Distribution L-Scale L-Skewness L-Kurtosis Exponential $$(\lambda)$$ $$\frac{1}{2\lambda}$$ $$\frac{1}{3}$$ $$\frac{1}{6}$$ Normal $$(\mu,\sigma^2)$$ $$\frac{\sigma}{\sqrt{\pi}}$$ $$0$$ $$30 \frac{1}{\pi} \tan^{-1} \sqrt{2}-9$$ Pareto $$(\alpha,\beta)$$ $$\frac{\alpha}{\beta (1-1/\alpha)(2-1/\alpha)}$$ $$\frac{1+1/\alpha}{3-1/\alpha}$$ $$\frac{(1+1/\alpha)(2+1/\alpha)}{(3-1/\alpha)(4-1/\alpha)}$$ Uniform $$(a,b)$$ $$\frac{(b-a)}{6}$$ 0 0 Distribution L-Scale L-Skewness L-Kurtosis Exponential $$(\lambda)$$ $$\frac{1}{2\lambda}$$ $$\frac{1}{3}$$ $$\frac{1}{6}$$ Normal $$(\mu,\sigma^2)$$ $$\frac{\sigma}{\sqrt{\pi}}$$ $$0$$ $$30 \frac{1}{\pi} \tan^{-1} \sqrt{2}-9$$ Pareto $$(\alpha,\beta)$$ $$\frac{\alpha}{\beta (1-1/\alpha)(2-1/\alpha)}$$ $$\frac{1+1/\alpha}{3-1/\alpha}$$ $$\frac{(1+1/\alpha)(2+1/\alpha)}{(3-1/\alpha)(4-1/\alpha)}$$ Uniform $$(a,b)$$ $$\frac{(b-a)}{6}$$ 0 0 3. Empirical distribution for the L-moments of the Barabási–Albert degree distribution In order to utilize the L-moments of the degree distribution as statistical measures for identifying or characterizing a network, the distribution of each L-moment must first be identified. This section examines a nonparametric approach to deriving the distribution of the L-moments for the Barabási–Albert degree distribution. Due to the graph theoretic property of the Barabási–Albert network, the mean degree is fixed for any given network with arbitrary size ($$n$$) and parameter ($$m$$). Therefore, only the empirical distributions of the L-scale $$(\lambda_2)$$, L-skewness $$(\tau_3)$$ and L-kurtosis $$(\tau_4)$$ of the Barabási–Albert degree distribution are simulated. Each distribution was examined individually and jointly for $$\lambda_2$$, $$\tau_3$$ and $$\tau_4$$ as suggested by Hosking [16]. Based on the findings of these examinations, distributional tests based on the normal distribution were conducted. Therefore, this simulation was conducted as followed: 1. Bootstrap each L-moment distribution from $$1000$$ randomly generated Barabási–Albert graphs for each $$m\in\{2,4,6\}$$ and $$n\in\{2^k:k=5,6,\ldots,15\}$$ parameter combination where $$m_0=m$$. 2. Test each distribution for normality using the Shapiro–Wilk and Anderson–Darling tests for univariate normality of the marginals, and using the Royston H test [28], a multivariate extension on the Shapiro–Wilk, for multivariate normality of the joint distributions. 3. Repeat step 1–2 $$100$$ times to obtain the proportion of instances where the L-moment distributions are no different than a normal distribution. 4. Compare results of step 3 to that expected at a $$\alpha = 0.05$$ level. The expected $$\lambda_2, \tau_3$$ and $$\tau_4$$ of the Generalized Pareto distribution were derived by Hosking [16]. The expected $$\tau_3$$ and $$\tau_4$$ relationship is shown in Fig. 1 along with the pairwise empirical L-skewness and L-kurtosis of the simulated Barabási–Albert network. Figure 1 also includes the empirical distribution of the L-skewness and L-kurtosis from the $$Pareto(m,2)$$ distribution where the sample size corresponds to the network sizes from the Barabási–Albert sampling. Overall, although the L-moments of the Barabási–Albert degree distribution converge towards the expected $$(\tau_3,\tau_4)$$ values of the Pareto distribution (point labelled 2 on the Generalized Pareto (Gen Pareto) line in Fig. 1), the distributions of the L-moments themselves do not lie on the line of expected values for the Pareto distribution. Another observation is that the separation of the L-moments distribution between the different network sizes becomes more prominent as $$m$$ increases. Fig. 1. View largeDownload slide Plot of $$\tau_4$$ vs $$\tau_3$$ for $$m=2,4,6$$ of the Barabási–Albert (BA) degree distribution and the Pareto distribution. (Note: Groupings distinguish $$k=5$$ (large spread) to $$k=15$$ (small spread). Points are expected $$(\tau_3,\tau_4)$$ values for the associated $$\beta$$ values for the Generalized Pareto (Gen Pareto) and Generalized Extreme Value (GEV) distributions. Exp is the expected ($$\tau_3,\tau_4)$$ for the Exponential distribution.) Fig. 1. View largeDownload slide Plot of $$\tau_4$$ vs $$\tau_3$$ for $$m=2,4,6$$ of the Barabási–Albert (BA) degree distribution and the Pareto distribution. (Note: Groupings distinguish $$k=5$$ (large spread) to $$k=15$$ (small spread). Points are expected $$(\tau_3,\tau_4)$$ values for the associated $$\beta$$ values for the Generalized Pareto (Gen Pareto) and Generalized Extreme Value (GEV) distributions. Exp is the expected ($$\tau_3,\tau_4)$$ for the Exponential distribution.) One can argue that the empirical degree distribution of the Barabási–Albert graph is different than what was theoretically derived (Section 2.1), especially for relatively small graphs, which has been previously shown empirically by Mohd-Zaid et al. [29]. Further, it is also apparent that the $$(\tau_3,\tau_4)$$ pair may have a bivariate distribution which resembles that of a normal distribution (Fig. 1). Therefore, the marginal, as well as the joint distributions, of the L-moments were tested for normality using the Shapiro–Wilk and Anderson–Darling test for the marginals and the Royston H test for the joint distributions. These tests show that none of the univariate distributions of $$\lambda_2$$ and $$\tau_4$$ differ significantly from the normal distribution, although it appears that the distributions for $$\tau_3$$ when $$k\leq 8$$ are significantly different from the normal distribution (Table 2). The histogram of the L-moments for $$k\in \{6,15\}$$ are shown in Fig. 2 where it appears that the normal distribution overlaps the empirical L-moment distributions quite fittingly even for $$\tau_3$$. The multivariate normality test on the joint distributions suggests that the non-normality of $$\tau_3$$ seems to affect whether or not the joint distributions are significantly different from the multivariate normal (Table 3). Specifically, it seems that when $$\tau_3$$ is part of the joint distribution and $$k\leq 6$$, said joint distribution is significantly different from the multivariate normal more often than when $$\tau_3$$ is not part of the joint distribution. Nevertheless, the proportion of the sample where the univariate and multivariate distributions involving $$\tau_3$$ for $$k\geq 6$$ is not significantly different than normal is in the large majority ($$\gg$$ 0.5), thus it was included in the investigation for selecting the appropriate L-moments for the test of hypothesis. Table 2 Proportion where distribution of L-moments are not significantly different from the normal distribution λ2 τ3 τ4 k m SW AD SW AD SW AD 5 2 0.93 0.97 0.4 0.58 0.89 0.88 5 4 0.92 0.94 0.43 0.54 0.83 0.94 5 6 0.96 0.96 0.65 0.77 0.84 0.87 7 2 0.9 0.92 0.74 0.85 0.89 0.95 7 4 0.99 1.00 0.83 0.88 0.96 0.99 7 6 0.89 0.95 0.84 0.86 0.91 0.92 9 2 0.92 0.93 0.87 0.87 0.96 0.98 9 4 0.94 0.94 0.91 0.97 0.93 0.97 9 6 0.97 0.97 0.94 0.97 0.98 0.96 11 2 0.94 0.97 0.97 0.96 0.94 0.92 11 4 0.97 0.95 0.92 0.93 0.95 0.98 11 6 0.94 0.93 0.9 0.93 0.95 0.97 λ2 τ3 τ4 k m SW AD SW AD SW AD 5 2 0.93 0.97 0.4 0.58 0.89 0.88 5 4 0.92 0.94 0.43 0.54 0.83 0.94 5 6 0.96 0.96 0.65 0.77 0.84 0.87 7 2 0.9 0.92 0.74 0.85 0.89 0.95 7 4 0.99 1.00 0.83 0.88 0.96 0.99 7 6 0.89 0.95 0.84 0.86 0.91 0.92 9 2 0.92 0.93 0.87 0.87 0.96 0.98 9 4 0.94 0.94 0.91 0.97 0.93 0.97 9 6 0.97 0.97 0.94 0.97 0.98 0.96 11 2 0.94 0.97 0.97 0.96 0.94 0.92 11 4 0.97 0.95 0.92 0.93 0.95 0.98 11 6 0.94 0.93 0.9 0.93 0.95 0.97 SW: Shapiro-Wilks, AD: Anderson-Darling tests for normality. Fig. 2. View largeDownload slide Example histograms of L-scale, L-skewness and L-kurtosis for $$m=2,\ k=6$$. Fig. 2. View largeDownload slide Example histograms of L-scale, L-skewness and L-kurtosis for $$m=2,\ k=6$$. Table 3 Proportion where multivariate distribution of L-moments are not significantly different from the multivariate normal distribution based on the Royston H-test k m τ2, τ3 τ2, τ4 τ3, τ4 τ2, τ3, τ4 5 2 0.48 0.69 0.49 0.49 5 4 0.49 0.86 0.48 0.57 5 6 0.64 0.81 0.58 0.65 7 2 0.84 0.91 0.88 0.87 7 4 0.89 0.91 0.91 0.9 7 6 0.87 0.95 0.89 0.88 9 2 0.92 0.97 0.93 0.92 9 4 0.94 0.92 0.95 0.94 9 6 0.88 0.92 0.91 0.89 11 2 0.87 0.91 0.89 0.90 11 4 0.95 0.93 0.94 0.94 11 6 0.91 0.89 0.95 0.90 k m τ2, τ3 τ2, τ4 τ3, τ4 τ2, τ3, τ4 5 2 0.48 0.69 0.49 0.49 5 4 0.49 0.86 0.48 0.57 5 6 0.64 0.81 0.58 0.65 7 2 0.84 0.91 0.88 0.87 7 4 0.89 0.91 0.91 0.9 7 6 0.87 0.95 0.89 0.88 9 2 0.92 0.97 0.93 0.92 9 4 0.94 0.92 0.95 0.94 9 6 0.88 0.92 0.91 0.89 11 2 0.87 0.91 0.89 0.90 11 4 0.95 0.93 0.94 0.94 11 6 0.91 0.89 0.95 0.90 All $$100$$ empirical distributions of each L-moment was then compiled together into one distribution of $$10^5$$ random samples to obtain a larger bootstrapped L-moment distribution. From the $$10^5$$ random samples, the mean and covariance structure between the L-moments were estimated $$(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\mu }_{m,k},\Sigma_{m,k})$$. The estimates were then used to standardize the L-moments and to transform them into the multivariate standard normal distribution which minimizes the correlation between the L-moments. The transformed multivariate normal distribution was used to create a statistical test of hypothesis to differentiate between various Barabási–Albert networks. 4. Tests on degree L-moments for the Barabási–Albert network 4.1 Multivariate standard normal distribution in polar coordinates A transformation of the bivariate and trivariate normal distributions from the Cartesian co-ordinate to the polar co-ordinate is used to create the appropriate test of hypotheses. This is performed in order to transform the multivariate L-moments collection into a single value (radius) that can be used as a criteria for the test of hypothesis. For the bivariate test, after transforming each L-moment into standard normal variables and defining a pair of distinct L-moments (or L-moment ratios) as $$X$$ and $$Y$$, let $$R=\sqrt{X^2+Y^2}$$ be the radius of the $$(X,Y)$$ pair from the centroid of the bivariate standard normal distribution. Then, the value of $$c$$ such that $$P(R>\sqrt{c})=\alpha$$ for a test of hypothesis that rejects $$H_0:(X,Y)\sim MVN(\mathbf{0},\mathbf{I})$$ vs the alternative $$H_A:(X,Y)\not\sim MVN(\mathbf{0},\mathbf{I})$$ is obtained by letting $$c=-2\ln{\alpha}$$. Therefore the p-value for the test of hypothesis for a specific Barabási–Albert network is given by $$P(R>\sqrt{-2\ln{\alpha}})$$ where $$\alpha$$ is the a priori determined Type-I error rate for the test. To extend the previous result to include a third variable $$Z$$ that is also $$Normal(0,1)$$, we define \begin{align*} X^2+Y^2+Z^2=R^2. \end{align*} The associated value of $$c$$ for the marginal of $$R$$ such that $$P(R>\sqrt{c})=\alpha$$ is given by \begin{align} \alpha &=1-\text{erf}{\left(\sqrt{\frac{c}{2}}\right)}+\sqrt{\frac{2c}{\pi}}e^{-c/2}. \end{align} (4.1) Therefore, to obtain the value of $$c$$ for a particular $$\alpha$$, one can numerically find the closest value for $$c$$ that satisfies Equation 4.1. Table 4 lists the values of $$c$$ that satisfy Equation 4.1 for select $$\alpha$$ values. From these results, one can now simultaneously test pairwise and triple combinations of standardized L-moments against their respective multivariate standard normal distribution. Table 4 Values of $$c$$ such that $$P(R>\sqrt{c})=\alpha$$ for the trivariate normal distribution $$\mathbf{\alpha}$$ 0.010 0.025 0.050 0.100 0.150 0.200 $$\mathbf{c}$$ 11.3449 9.3484 7.8147 6.2514 5.3171 4.6416 $$\mathbf{\alpha}$$ 0.010 0.025 0.050 0.100 0.150 0.200 $$\mathbf{c}$$ 11.3449 9.3484 7.8147 6.2514 5.3171 4.6416 We propose three tests using the L-moments of the degree distribution of a Barabási–Albert network based on the Standard Normal univariate distribution as well as the Multivariate Standard Normal distribution. The tests are built upon the hypothesis $$H_0:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda} \in \mathbf{\Lambda}_{(m,k)}$$ vs $$H_A:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda} \notin \mathbf{\Lambda}_{(m,k)}$$ where $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda}$$ is the collection of L-moment estimates for a given network and $$\mathbf{\Lambda}_{(m,k)}$$ is the empirical distribution of the L-moments with a mean and covariance structure $$(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\mu },\Sigma)$$ for a Barabási–Albert graph of size $$k$$ with parameter $$m$$ as estimated in Section 3. Consider the standardized $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda}$$ as $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{t}=\Sigma^{-\frac{1}{2}}(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda}-\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\mu })'$$, and define the test statistic \begin{equation*} S={\|\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{t}\|}_2. \end{equation*} Thus, for a given Type-I error, $$\alpha$$, a test of $$H_0:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda} \in \mathbf{\Lambda}_{(m,k)}$$ vs $$H_A:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{\lambda} \notin \mathbf{\Lambda}_{(m,k)}$$ will reject $$H_0$$ if \begin{equation} S\geq \sqrt{c} \end{equation} (4.2) where $$c$$ is defined as $$-2\ln{\alpha}$$ for a bivariate test, and is provided for specific $$\alpha$$ values of a trivariate test in Table 4. Note that for the univariate case, the rejection criterion is equivalent to the univariate standard normal distribution, where $$H_0$$ is rejected if \begin{equation} S\leq z_{\alpha/2}\quad\text{or}\quad S\geq z_{1-\alpha/2}. \end{equation} (4.3) 4.2 Power of the tests on degree L-moments In order to develop a test that is able to detect degradation within the degree distribution of a Barabási–Albert network, it is a necessary condition for the test to first be able to correctly classify the networks with high power. Recall that the test of normality on the L-moments in Section 3 showed that the distribution of L-scale $$(\lambda_2)$$ was no different than a normal distribution in roughly $$90\%$$ of the samples or higher. However, the opposite could be said for L-skewness $$(\tau_3)$$ where the distribution of $$\tau_3$$ was shown to be no different than a normal distribution for only a small proportion of the samples when $$k\leq 8$$ despite the joint distributions of the L-moments not being significantly different from the multivariate normal for smaller $$k$$. Therefore, the power of the test based on Equations (4.2) and (4.3) using L-scale $$(\lambda_2)$$, L-skewness $$(\tau_3)$$ and L-kurtosis $$(\tau_4)$$, as well as their bivariate and trivariate distributions is investigated. For each $$m$$ and $$k$$ combination, a Barabási–Albert network is generated and assigned as the $$target$$ network. Then its $$\lambda_2,\tau_3$$ and $$\tau_4$$ for its degree distribution are computed and compared to that of the estimated distributions from Section 3 which are designated as the $$class$$ networks. If the statistic for the $$target$$, as defined in Section 4, falls within the rejection region, then the network will be rejected from being assigned to the $$class$$ network. These steps are outlined in Algorithm 1. The false negative and true negative counts are aggregated to compute the Type-I error and power of the tests for each $$(target,k)$$ pair, respectively. Algorithm 1 View largeDownload slide L-moments classification algorithm Algorithm 1 View largeDownload slide L-moments classification algorithm The simulation shows that the test using only $$\lambda_2$$ with rejection region from Equations (4.2) and (4.3) maintained the appropriate $$\alpha=0.05$$ level for all $$m\in\{1,\ldots,7\}$$ (P-values $$\approx0.05$$ for the diagonal elements in Tables 5 and 6). Although not presented here, the test using only $$\lambda_2$$ outperforms the tests on $$\tau_3$$ and $$\tau_4$$. Additionally, we considered performance for networks with $$k\in \{5,\ldots,14\}$$ and this test is also shown to be quite powerful for classification, as it is able to correctly reject when $$target\neq class$$ with a probability of one for $$k\geq 7$$. However, the power of the test for $$k= 5$$ could be improved upon by the multivariate addition of $$\tau_3$$ and $$\tau_4$$ as shown in Table 6, despite the test being more prone to misclassification for $$m\geq 4$$ when $$k=5$$. Nevertheless, the trivariate test on $$(\lambda_2,\tau_3,\tau_4)$$ was able to improve the power of the test for these values of $$m$$ and $$k$$ ($$m\geq 4$$ and $$k=5$$). Pairwise bivariate combinations of L-moments were also considered but all were shown to be less powerful than the trivariate test. Table 5 Power of the test using only $$\lambda_2$$ $$k=5$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.049 1 1 1 1 1 1 2 1 0.054 1 1 1 1 1 3 1 0.997 0.050 0.927 1 1 1 4 1 1 0.929 0.048 0.680 0.988 1 5 1 1 1 0.702 0.047 0.394 0.846 6 1 1 1 0.985 0.468 0.042 0.234 7 1 1 1 0.999 0.856 0.226 0.049 $$k=5$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.049 1 1 1 1 1 1 2 1 0.054 1 1 1 1 1 3 1 0.997 0.050 0.927 1 1 1 4 1 1 0.929 0.048 0.680 0.988 1 5 1 1 1 0.702 0.047 0.394 0.846 6 1 1 1 0.985 0.468 0.042 0.234 7 1 1 1 0.999 0.856 0.226 0.049 $$k=6$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.047 1 1 1 1 1 1 2 1 0.037 1 1 1 1 1 3 1 1 0.041 1 1 1 1 4 1 1 1 0.048 0.999 1 1 5 1 1 1 0.999 0.045 0.980 1 6 1 1 1 1 0.975 0.058 0.947 7 1 1 1 1 1 0.935 0.056 $$k=6$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.047 1 1 1 1 1 1 2 1 0.037 1 1 1 1 1 3 1 1 0.041 1 1 1 1 4 1 1 1 0.048 0.999 1 1 5 1 1 1 0.999 0.045 0.980 1 6 1 1 1 1 0.975 0.058 0.947 7 1 1 1 1 1 0.935 0.056 Table 6 Power of the test using $$(\lambda_2,\tau_3,\tau_4)$$ jointly $$k=5$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.077 1 1 1 1 1 1 2 1 0.069 1 1 1 1 1 3 1 1 0.054 1 1 1 1 4 1 1 1 0.068 1 1 1 5 1 1 1 1 0.067 1 1 6 1 1 1 1 0.997 0.049 0.989 7 1 1 1 1 1 0.948 0.051 $$k=5$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.077 1 1 1 1 1 1 2 1 0.069 1 1 1 1 1 3 1 1 0.054 1 1 1 1 4 1 1 1 0.068 1 1 1 5 1 1 1 1 0.067 1 1 6 1 1 1 1 0.997 0.049 0.989 7 1 1 1 1 1 0.948 0.051 $$k=6$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.065 1 1 1 1 1 1 2 1 0.053 1 1 1 1 1 3 1 1 0.052 1 1 1 1 4 1 1 1 0.056 1 1 1 5 1 1 1 1 0.058 1 1 6 1 1 1 1 1 0.07 1 7 1 1 1 1 1 1 0.048 $$k=6$$ $$Class\ (m_1)$$ $$Target\ (m)$$ 1 2 3 4 5 6 7 1 0.065 1 1 1 1 1 1 2 1 0.053 1 1 1 1 1 3 1 1 0.052 1 1 1 1 4 1 1 1 0.056 1 1 1 5 1 1 1 1 0.058 1 1 6 1 1 1 1 1 0.07 1 7 1 1 1 1 1 1 0.048 One implication that can be made from the network classification result of using the L-moments is that the tests are, in essence, tests for detecting changes within the degree distribution with respect to $$m$$. Thus, should the network behave in such a way to cause its degree distribution to deviate from the initial network with minimum degree ($$m$$), then the $$(\lambda_2,\tau_3,\tau_4)$$ test is especially powerful in detecting such a change, even for smaller networks. However, it is worth noting that the change in $$m$$ (the minimum degree) is discrete and has very low resolution. Therefore, the performance of the test needs to be evaluated with respect to a more sensitive change within the network. Hence, we examine the $$(\lambda_2,\tau_3,\tau_4)$$ test as the test for change detection since it is very sensitive to the perturbation within the degree distribution of the Barabási–Albert network due to the combination of three L-moments. In addition, the univariate test on $$\lambda_2$$ is also examined as a baseline for comparison since there are no concerns with non-normality for $$\lambda_2$$, and a test on $$\lambda_2$$ may be considered parsimonious for particular network sizes. 5. Sensitivity analysis of edge and node deletion For detecting how sensitive the $$\lambda_2$$ and $$(\lambda_2,\tau_3,\tau_4)$$ tests are with respect to nodal or edge deletion within the Barabási–Albert network, the network is degraded at three levels of nodal degrees: (1) nodes with minimum degree $$m$$ (low degree), (2) nodes with degree equal to the median degree (medium degree) and (3) nodes with degree in the top $$1\%$$ of the network (high degree). For each level of degree, an investigation on how the test reacts to both edge deletion and, separately, node deletion, is conducted by varying the proportion of edge or node deletion $$p\in\{0.01,0.02,\ldots,0.1,0.15,0.2,0.3,0.4,0.5\}$$. The algorithms used for both methods of deletion are outlined in Algorithms 2 and 3. Algorithms 2 and 3 were embedded within Algorithm 4 to compute the power of both tests as a function of $$p$$ for each $$(m,k)$$ combination with rejection as given by Equations (4.2) and (4.3). In essence, the power of the test is computed for each $$target$$ by deleting edges or nodes with the appropriate $$p$$ and seeing how well each test rejects the $$target$$ from each $$class$$ (i.e. detects that the network is different from its original target network). Note that for node deletion, deleting a particular node has the consequence of deleting all edges connected to said node. However, the resulting number of edges deleted when examining edge deletion or node deletion are equivalent for low and medium degree nodes. For example, suppose that only nodes with degree equal to $$d$$ are affected and suppose that the probability of deletion is $$p$$. Let $$\nu$$ be the number of nodes with degree equal to $$d$$. Then, for edge deletion, the number of edges affected is $$e_v=\nu d$$ and the number of edges deleted is $$e_v p=\nu d p$$. Similarly, the number of nodes deleted for node deletion is $$v_1=\nu p$$ and the number of edges affected is $$v_1 d=\nu p d$$. Therefore, the difference between edge deletion and node deletion becomes the subtle distinction of whether the edges are deleted randomly out of all affected edges or whether the edges are deleted randomly albeit more concentrated as a function of node selection. Notably, the number of edges deleted becomes more stochastic for high degree levels since the degrees of the affected nodes $$(\nu)$$ vary more than the degree values of the affected nodes for low (minimum) degree nodes. Algorithm 2 View largeDownload slide Edge deletion algorithm Algorithm 2 View largeDownload slide Edge deletion algorithm Algorithm 3 View largeDownload slide Node deletion algorithm Algorithm 3 View largeDownload slide Node deletion algorithm Algorithm 4 View largeDownload slide L-moments change detection algorithm Algorithm 4 View largeDownload slide L-moments change detection algorithm 5.1 Edge deletion 5.1.1 Characteristics of edge deletion The characteristics of the network in terms of resulting isolates, components and the network’s clustering coefficient with respect to the deletion process are first investigated. Isolates are any nodes with degree zero resulting from the deletion process, and components are disconnected subgraphs that are themselves connected and resulted from the deletion process. It should be noted that when the minimum degree is 1 ($$m=1$$), the clustering coefficient is always zero due to the fact that only one edge can be added as the network grows, thus a triad formation is not possible. When considering the number of isolates and components caused by edge deletion, it is very apparent that as the minimum degree ($$m$$) increases, the networks become less affected by the deletion process (Tables 7 and 8) resulting in fewer isolates and fewer components. It is also apparent that the required network size ($$k$$) and proportion of deletion ($$p$$) that would result in isolates and components becomes larger as $$m$$ increases. This is expected since most of the nodes when $$m=1$$ have only a single edge that connects them to another node. Thus, any deletion will likely cause some isolates that will eventually lead to the network being fragmented, resulting in more components. Table 7 Summary of isolates caused by edge deletion No. of isolates $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 Low All All $$(6,8)$$ $$(4876,4956)$$ Medium $$6,\ldots,14$$ All $$(6,8)$$ $$(4876,4956)$$ High $$9,\ldots,14$$ All $$(6,19)$$ $$(553,688)$$ 2 Low $$7,\ldots,14$$ $$0.04,\ldots,0.5$$ $$(5,13)$$ $$(1813,1910)$$ Medium $$9,\ldots,14$$ $$0.15,\ldots,0.5$$ $$(5,15)$$ $$(537,615)$$ High $$13,14$$ $$0.4,0.5$$ $$(6,21)$$ $$(17,39)$$ 3 Low $$8,\ldots,14$$ $$0.15,\ldots,0.5$$ $$(7,17)$$ $$(716,792)$$ Medium $$10,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(9,22)$$ $$(224,282)$$ 4 Low $$9,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(5,15)$$ $$(288,345)$$ Medium $$13,14$$ $$0.5$$ $$(11,71)$$ $$(28,134)$$ 5 Low $$11,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(10,25)$$ $$(118,158)$$ Medium $$14$$ $$0.5$$ none $$(11,28)$$ 6 Low $$12,\ldots,14$$ $$0.4,0.5$$ $$(8,23)$$ $$(46,75)$$ 7 Low $$13,14$$ $$0.5$$ $$(7,21)$$ $$(18,38)$$ No. of isolates $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 Low All All $$(6,8)$$ $$(4876,4956)$$ Medium $$6,\ldots,14$$ All $$(6,8)$$ $$(4876,4956)$$ High $$9,\ldots,14$$ All $$(6,19)$$ $$(553,688)$$ 2 Low $$7,\ldots,14$$ $$0.04,\ldots,0.5$$ $$(5,13)$$ $$(1813,1910)$$ Medium $$9,\ldots,14$$ $$0.15,\ldots,0.5$$ $$(5,15)$$ $$(537,615)$$ High $$13,14$$ $$0.4,0.5$$ $$(6,21)$$ $$(17,39)$$ 3 Low $$8,\ldots,14$$ $$0.15,\ldots,0.5$$ $$(7,17)$$ $$(716,792)$$ Medium $$10,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(9,22)$$ $$(224,282)$$ 4 Low $$9,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(5,15)$$ $$(288,345)$$ Medium $$13,14$$ $$0.5$$ $$(11,71)$$ $$(28,134)$$ 5 Low $$11,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(10,25)$$ $$(118,158)$$ Medium $$14$$ $$0.5$$ none $$(11,28)$$ 6 Low $$12,\ldots,14$$ $$0.4,0.5$$ $$(8,23)$$ $$(46,75)$$ 7 Low $$13,14$$ $$0.5$$ $$(7,21)$$ $$(18,38)$$ CI: confidence interval. Table 8 Summary of components resulted from edge deletion No. of components $$95\%$$ CI $$m$$ Degree Level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 Medium $$5$$ $$0.4,0.5$$ None $$(1,2)$$ High All All $$(1,2)$$ $$(823,962)$$ 2 Medium All $$0.04,\ldots,0.5$$ $$(1,2)$$ $$(39,66)$$ High $$10,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,4)$$ 3 Medium $$7,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(2,10)$$ 4 Medium $$11,\ldots,14$$ $$0.4,0.5$$ $$(1,2)$$ $$(1,3)$$ 5 Medium $$14$$ $$0.5$$ None $$(1,2)$$ No. of components $$95\%$$ CI $$m$$ Degree Level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 Medium $$5$$ $$0.4,0.5$$ None $$(1,2)$$ High All All $$(1,2)$$ $$(823,962)$$ 2 Medium All $$0.04,\ldots,0.5$$ $$(1,2)$$ $$(39,66)$$ High $$10,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,4)$$ 3 Medium $$7,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(2,10)$$ 4 Medium $$11,\ldots,14$$ $$0.4,0.5$$ $$(1,2)$$ $$(1,3)$$ 5 Medium $$14$$ $$0.5$$ None $$(1,2)$$ Further examination of the number of components caused by the deletion process on the network has some implication in real-world applications where it might be of interest to study the vulnerability of a particular network either for interdiction or protection. Figures 3 and 4 show that even for small $$k$$, edge deletion causes the network to be broken apart into multiple components. Although, smaller proportions ($$p$$) do not cause the same effect. However, as $$k$$ gets larger, smaller proportions start to affect the number of resulting components when $$m=1,2$$ (Figs. 3 and 4). It should be noted that for $$m=1$$, it is high degree edge deletion that is producing such result whereas it is medium degree edge deletion for $$m=2$$. Other combinations of degree level and $$m$$ do not provide any patterns that might suggests that there is an effect present with respect to the number of isolates and components. Fig. 3. View largeDownload slide Number of components (x-axis) caused by high degree edge deletion on $$m=1$$. (Bars are $$95\%$$ CI.) Fig. 3. View largeDownload slide Number of components (x-axis) caused by high degree edge deletion on $$m=1$$. (Bars are $$95\%$$ CI.) Fig. 4. View largeDownload slide Number of components (x-axis) caused by medium degree edge deletion on $$m=2$$. (Bars are $$95\%$$ CI.) Fig. 4. View largeDownload slide Number of components (x-axis) caused by medium degree edge deletion on $$m=2$$. (Bars are $$95\%$$ CI.) When studying the change in the clustering coefficient caused by the edge deletion process, only edge deletion on high degree nodes for all network with $$m\neq 1$$ resulted in a significant change in the clustering coefficient as the proportion of deletion, $$p$$, increases (Fig. 5). However, the size $$k$$ at which the clustering coefficient becomes significantly different as a function of $$p$$ varies with the minimum degree, $$m$$. It seems that the clustering coefficient becomes significantly smaller as $$p$$ increases, but only for $$k\geq 9$$. This result is counter to what is observed from the results on isolates and components, where it was shown that as $$m$$ increases, the networks become less affected by edge deletion on high degree nodes. This suggests that the clustering coefficient is able to capture a characteristic of the network as a function of edge deletion that was not able to be captured by the number of isolates and components alone. Overall, edge deletion seems to have created isolates uniformly across all sizes of networks, but only created components on medium and high degree levels. Even so, the clustering or triadic structure of the network are not affected unless the high degree nodes are affected. Fig. 5. View largeDownload slide Clustering coefficients (x-axis) of networks after edge deletion on high degrees. (Bars are $$95\%$$ CI.) Fig. 5. View largeDownload slide Clustering coefficients (x-axis) of networks after edge deletion on high degrees. (Bars are $$95\%$$ CI.) 5.1.2 Power of the test on edge deletion For low degree and high degree edge deletion, as the proportion of edges deleted ($$p$$) increases, the test on $$\lambda_2$$ becomes more likely to reject the degraded network from being classified as the network from which it originated (more likely to detect the change), but the power drops unexpectedly when $$p$$ becomes too large for $$m=2$$ (Fig. 6). It should be noted again that although we considered all combinations of $$m\in\{1,\dots,7\}$$ and $$k\in\{5,\ldots,14\}$$, only select cases are illustrated and homogeneous results are not discussed for the sake of conciseness. Figure 7 shows the distribution for $$\lambda_2$$ with respect to $$p$$ for $$m=2$$, in which the distribution increases (decreases) and then decreases (increases) for low (medium) degree, explaining the patterns in the power curves. However, the multivariate test of $$(\lambda_2,\tau_3,\tau_4)$$ outperforms the univariate test of $$\lambda_2$$ in almost all cases when $$k$$ is smaller and is able to maintain its power even when $$p$$ becomes large (Fig. 8). For medium degree edge deletion, although the $$(\lambda_2,\tau_3,\tau_3)$$ test outperforms the $$\lambda_2$$ test, the power is less stable for smaller $$k$$ when compared to both low degree and high degree where there is a drop in power around $$p=0.2$$ for $$k\geq 8$$ and $$m\geq 3$$. Additionally, the power of the $$(\lambda_2,\tau_3,\tau_4)$$ test also decreases as $$m$$ increases for a fixed $$k$$. However, the power still approaches 1 as $$p$$ increases. This seems to suggest that it is harder to detect subtle degradation within the network with respect to edge deletion when only the medium (median) degree nodes are affected. Fig. 6. View largeDownload slide Power curve for low degree with $$m=2$$ and $$k=10$$. Fig. 6. View largeDownload slide Power curve for low degree with $$m=2$$ and $$k=10$$. Fig. 7. View largeDownload slide Boxplot of $$\lambda_2$$ with respect to $$p$$ (x-axis) for $$m=2$$ of low degree deletion. Fig. 7. View largeDownload slide Boxplot of $$\lambda_2$$ with respect to $$p$$ (x-axis) for $$m=2$$ of low degree deletion. Fig. 8. View largeDownload slide Power vs $$p$$ for low (top), medium (middle) and high (bottom) degree edge deletion with $$m\in\{1,4,7\}$$ and $$k\in\{5,10,14\}$$. Gray $$\lambda_2$$; Black ($$\lambda_2$$, $$\tau_3$$, $$\tau_4$$). Fig. 8. View largeDownload slide Power vs $$p$$ for low (top), medium (middle) and high (bottom) degree edge deletion with $$m\in\{1,4,7\}$$ and $$k\in\{5,10,14\}$$. Gray $$\lambda_2$$; Black ($$\lambda_2$$, $$\tau_3$$, $$\tau_4$$). It is standard practice to consider a power of 80% as being sufficient to deem a test as being a good test. Therefore, the minimum network size ($$k$$) that is required for a test to achieve 80% power vs the proportion of deletion ($$p$$) is compared between the $$\lambda_2$$ and $$(\lambda_2,\tau_3,\tau_4)$$ tests. As shown in Table 9, the test using $$\lambda_2$$ could not achieve good power for some values of $$p$$, but the multivariate test using $$(\lambda_2,\tau_3,\tau_4)$$ was able to do so for almost all cases. As $$k$$ increases, the proportion of deleted edges necessary to detect the change drops from an upward of 40–50$$\%$$ to as low as $$1\%$$. Although the $$(\lambda_2,\tau_3,\tau_4)$$ test could not always achieve $$80\%$$ power when $$m\geq 5$$ for $$k=5$$, it outperforms the $$\lambda_2$$ test in general in detecting a network change with up to 20% less deletion. This is attributable to the additional sensitivity in detecting change that $$\tau_3$$ and $$\tau_4$$ adds to $$\lambda_2$$. Overall, considering deletion of only minimum degree nodes (low degree), 20% or greater of edges must be deleted before the test using $$\lambda_2$$ could detect a change in the network. However, the $$(\lambda_2,\tau_3,\tau_4)$$ test can detect the change from only a 1–15% proportion of deletion for $$m\geq 2$$. When considering deletion of high degree nodes, the test using $$(\lambda_2,\tau_3,\tau_4)$$ can detect degradation of large networks with as little as 2% of edges being deleted. Table 9 Smallest proportion of edge deletion, $$p$$, required to achieve 80% power $$k=5$$ $$k=7$$ $$k=10$$ Degree level $$m$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ Low 1 X 0.30 X 0.30 0.40 0.30 3 X 0.04 X 0.02 0.20 0.01 5 X 0.07 0.50 0.03 0.20 0.01 7 X 0.15 0.50 0.04 0.20 0.01 Medium 1 X 0.40 X 0.40 0.40 0.30 3 X X X 0.20 X 0.15 5 X X X 0.40 0.50 0.30 7 X X X 0.40 X 0.30 High 1 0.50 0.30 0.30 0.09 0.15 0.02 3 X 0.40 0.40 0.30 0.10 0.07 5 X X 0.30 0.30 0.08 0.06 7 X X 0.30 0.30 0.07 0.06 $$k=5$$ $$k=7$$ $$k=10$$ Degree level $$m$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ Low 1 X 0.30 X 0.30 0.40 0.30 3 X 0.04 X 0.02 0.20 0.01 5 X 0.07 0.50 0.03 0.20 0.01 7 X 0.15 0.50 0.04 0.20 0.01 Medium 1 X 0.40 X 0.40 0.40 0.30 3 X X X 0.20 X 0.15 5 X X X 0.40 0.50 0.30 7 X X X 0.40 X 0.30 High 1 0.50 0.30 0.30 0.09 0.15 0.02 3 X 0.40 0.40 0.30 0.10 0.07 5 X X 0.30 0.30 0.08 0.06 7 X X 0.30 0.30 0.07 0.06 5.1.3 Implication of edge deletion There are a few implications from the results of edge deletion. Edge deletion can be illustrated as destroying the connectors within a network. For example, edge deletion in the context of road networks between places of interest can be thought of as destroying or seizing the roads connecting those places. In social context, it can be thought of as intercepting or blocking means of communications between individuals, or it can even be thought of as ruining the relationships between individuals. Depending on the objective, one might be more interested in degrading a network, or in detecting whether or not the network is being degraded and reacting accordingly. The most obvious implication for this research is that nodes of Barabási–Albert networks with lower ‘degree’ (fewer connections) are at risk of being isolated or fragmented into multiple sub-networks if the connections between nodes are deleted, especially between those with minimum or median ‘degrees’. However, for these networks, the clustering within the remaining connected nodes is not affected, and this implies that the connections within the remaining sub-networks is intact. On the other hand, for networks whose minimum ‘degree’ is larger, the more effective method of degrading the network is to delete connections for nodes with high degree, and although this will not degrade the network into sub-networks, it will reduce the clustering within the network and reduce its connectivity. These results are summarized in Table 10. Table 10 Recommended degree level at which to perform edge deletion that results in isolates, components, and changes in clustering Characteristic $$m$$ (Minimum degree) affected 1 2 3 4 5 6 7 Isolates Low, Medium, High Low, Medium Low, Medium Low Low None None Components High Medium, High Medium Medium None None None Clustering None High High High High High High Characteristic $$m$$ (Minimum degree) affected 1 2 3 4 5 6 7 Isolates Low, Medium, High Low, Medium Low, Medium Low Low None None Components High Medium, High Medium Medium None None None Clustering None High High High High High High Nevertheless, if one is concerned with detecting degradation within the network, then this can be achieved with good power using the $$(\lambda_2,\tau_3,\tau_4)$$ test on the L-moments of the degree distribution. This is especially true for detecting degradation caused by edge deletion of nodes with minimum and median degree where it was shown that the $$(\lambda_2,\tau_3,\tau_4)$$ test outperforms the $$\lambda_2$$ test. Additionally, although the $$\lambda_2$$ test performs comparably to the $$(\lambda_2,\tau_3,\tau_4)$$ test for detecting edge deletion on high degree nodes, the $$(\lambda_2,\tau_3,\tau_4)$$ test is better at detection at a smaller proportion of deletion for networks with smaller minimum degree and size (Table 9). 5.2 Node deletion 5.2.1 Characteristics of node deletion When considering the number of isolates and components caused by node deletion, it appears that node deletion does not cause as many isolates and components as edge deletion since only a few combinations of degree level, $$k$$, and $$p$$ resulted in isolates or components. However, note that the act of node deletion itself may remove any possible isolates especially if the nodes affected are low degree level nodes. Therefore a comparison of isolates caused by the two deletion methods is not an appropriate one. Although we considered all combinations of $$m\in\{1,\ldots,7\}$$ and $$k\in\{5,\ldots,14\}$$, the results that will be discussed most are for lower $$m$$ and $$k$$ where the results are not as homogeneous. Similar to edge deletion, as $$m$$ increases, the networks become much less affected by node deletion (Tables 11 and 12). However, only networks with $$m=1,2$$ are affected by node deletion, and $$m=1$$ is more affected than $$m=2$$, since the resulting number of isolates and components is larger by magnitudes. Again, this is due to the small degrees in a network with $$m=1$$. Table 11 Summary of isolates caused by node deletion process No. of isolates $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 High $$9,\ldots,14$$ $$0.02,\ldots,0.5$$ $$(6,25)$$ $$(532,700)$$ 2 Medium $$12,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(7,23)$$ $$(16,40)$$ High $$13,14$$ $$0.4,0.5$$ $$(6,22)$$ $$(16,40)$$ No. of isolates $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 High $$9,\ldots,14$$ $$0.02,\ldots,0.5$$ $$(6,25)$$ $$(532,700)$$ 2 Medium $$12,\ldots,14$$ $$0.3,\ldots,0.5$$ $$(7,23)$$ $$(16,40)$$ High $$13,14$$ $$0.4,0.5$$ $$(6,22)$$ $$(16,40)$$ Table 12 Summary of components resulted from node deletion process No. of Components $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 High all all $$(1,5)$$ $$(699,895)$$ 2 Medium $$10,\dots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,3)$$ High $$10,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,4)$$ No. of Components $$95\%$$ CI $$m$$ Degree level Range of $$k$$ Range of $$p$$ $$(lowest)$$ $$(highest)$$ 1 High all all $$(1,5)$$ $$(699,895)$$ 2 Medium $$10,\dots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,3)$$ High $$10,\ldots,14$$ $$0.2,\ldots,0.5$$ $$(1,2)$$ $$(1,4)$$ Further investigation on the number of components caused by node deletion shows that only high degree level of $$m=1$$ is heavily affected by the deletion process, and a higher proportion of deletion ($$p$$) is needed to cause any fragmentation of the network when $$k$$ is small (Fig. 9). However, when $$k$$ is large, the resulting number of components is comparable to that of edge deletion. Fig. 9. View largeDownload slide Number of components (x-axis) caused by high node deletion on $$m=1$$. (Bars are $$95\%$$ CI.) Fig. 9. View largeDownload slide Number of components (x-axis) caused by high node deletion on $$m=1$$. (Bars are $$95\%$$ CI.) Similar to edge deletion, investigation on the clustering coefficient of the networks with respect to node deletion suggests that clusters within the network are only affected by node deletion on high degree nodes. However, unlike edge deletion, only networks with $$m\geq 3$$ are affected (Fig. 10) and the effects are not as prominent as those for edge deletion. When $$m=2$$, the clustering coefficient does not significantly change as more nodes are deleted (i.e. $$p$$ increases) even when the network is large ($$k=14$$). Again, this result is the opposite from those of isolates and components. Thus, it appears that although node deletion on high degree results in a large number of isolates and components when the minimum degree is small ($$m\leq 2$$), the clustering of the network is not affected. On the other hand, when the minimum degree is not small ($$m\geq 3$$), the network is able to stay connected, but the remaining nodes now become less clustered as shown by the decreasing clustering coefficient. Fig. 10. View largeDownload slide Clustering coefficients (x-axis) of networks after node deletion on high degrees. (Bars are $$95\%$$ CI.) Fig. 10. View largeDownload slide Clustering coefficients (x-axis) of networks after node deletion on high degrees. (Bars are $$95\%$$ CI.) 5.2.2 Power of the test on node deletion The test using $$\lambda_2$$ appears to have very low power for medium degree node deletion with $$m\geq 2$$, where it could barely achieve 80% power unless $$p$$ and $$k$$ are very large (Fig. 11 and Table 13). Medium degree node deletion also appears to be the only level of node deletion where the $$(\lambda_2,\tau_3,\tau_4)$$ test definitely outperforms the $$\lambda_2$$ test. For low and high degree node deletion, both tests seem to be equal in power. Nevertheless, for low degree node deletion, the $$(\lambda_2,\tau_3,\tau_4)$$ test outperforms the $$\lambda_2$$ test when $$k$$ is small or when $$m$$ is large. For large degree node deletion, both tests perform similarly and neither test was effective when $$k$$ is small. Fig. 11. View largeDownload slide Power vs $$p$$ for low (top), medium (middle), and high (bottom) degree node deletion with $$m\in\{1,4,7\}$$ and $$k\in\{5,10,14\}$$. - - - - $$k=5$$; ..... $$k=9$$; ———— $$k=14$$; Gray $$\lambda_2$$; Black $$(\lambda_2,\tau_3,\tau_4)$$ Fig. 11. View largeDownload slide Power vs $$p$$ for low (top), medium (middle), and high (bottom) degree node deletion with $$m\in\{1,4,7\}$$ and $$k\in\{5,10,14\}$$. - - - - $$k=5$$; ..... $$k=9$$; ———— $$k=14$$; Gray $$\lambda_2$$; Black $$(\lambda_2,\tau_3,\tau_4)$$ Table 13 Smallest proportion of node deletion, $$p$$, required to achieve 80% power $$k=5$$ $$k=7$$ $$k=10$$ Degree level $$m$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ Low 1 X 0.30 X 0.30 0.40 0.30 3 X X X X 0.50 0.40 5 X X X X X 0.40 7 X X X X X X Medium 1 X 0.40 X 0.40 0.40 0.30 3 X X X 0.2 X 0.05 5 X X X 0.50 X 0.15 7 X X X X X 0.30 High 1 X X X 0.50 0.20 0.10 3 X X X 0.50 0.20 0.10 5 X X 0.50 0.50 0.20 0.10 7 X X 0.50 0.50 0.10 0.10 $$k=5$$ $$k=7$$ $$k=10$$ Degree level $$m$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ $$\lambda_2$$ $$(\lambda_2,\tau_3,\tau_4)$$ Low 1 X 0.30 X 0.30 0.40 0.30 3 X X X X 0.50 0.40 5 X X X X X 0.40 7 X X X X X X Medium 1 X 0.40 X 0.40 0.40 0.30 3 X X X 0.2 X 0.05 5 X X X 0.50 X 0.15 7 X X X X X 0.30 High 1 X X X 0.50 0.20 0.10 3 X X X 0.50 0.20 0.10 5 X X 0.50 0.50 0.20 0.10 7 X X 0.50 0.50 0.10 0.10 When comparing the minimum network size $$k$$ that is required for a test to achieve 80% power versus the proportion of deletion $$p$$ for node deletion, several observations are made. For low degree node deletion, the $$\lambda_2$$ test could not achieve good power unless $$k\geq 10$$, but good power could be achieved through the $$(\lambda_2,\tau_3,\tau_4)$$ test when the proportion of deletion is at least 30%. For medium degree node deletion, the $$\lambda_2$$ test could not achieve the desired power for most cases and even when it does, the required network size is very large $$(k\geq 10)$$ and the proportion required is 20% or more. In comparison, the $$(\lambda_2,\tau_3,\tau_4)$$ test can achieve good power when deleting as little as 10% of nodes. Lastly, the performance of both tests are comparable for high degree node deletion in terms of minimum required size $$(k\geq 7)$$, and the $$(\lambda_2,\tau_3,\tau_4)$$ test is only slightly better $$(k\geq 7)$$ where the difference in the proportion of deletion is at most 10%. One overall observation that applies to all three degree node deletion levels is that, unlike the drastic drop in the required network size observed in edge deletion, the change for node deletion is gradual, requiring higher network sizes for change detection. Furthermore, the proportion of nodes affected must be higher than the proportion of edges affected to detect a change. These results imply that to detect degradation in a network, or to degrade a network, a larger effect is obtained through a smaller proportion affected with respect to edge deletion than through node deletion. 5.2.3 Implication of node deletion There are some implications from the results for node deletion similar to edge deletion. Unlike edge deletion, node deletion can be illustrated as removing the actors or entities of interest within a network. Going back to the context of road networks, a node deletion is analogous to destroying the actual places of interest within the network such as crossroads whereas in a social context, it can be thought of as detaining or removing the specific individuals from the network. Note that in both contexts, once the entities are removed, all connections between those entities to others in the network are rendered useless. This corresponds directly to edges being removed when removing the nodes in a graph. It is apparent from the characteristic of node deletion that the nodes of networks with fewer connections are at risk of being isolated or fragmented into multiple sub-networks if the nodes with more connections are deleted. However, the clustering within the remaining connected nodes is not affected. For networks with more nodes that has fewer connections, the more effective method of degrading the network is to delete nodes with high connections similar to edge deletion since it will reduce the clustering within the network and reduce its connectivity. These results are summarized in Table 14. Table 14 Recommended degree level at which to perform node deletion that results in isolates, components, or changes in clustering $$m$$ (Minimum degree) Characteristic affected 1 2 3 4 5 6 7 Isolates High Medium, High None None None None None Components High Medium, High None None None None None Clustering None None High High High High High $$m$$ (Minimum degree) Characteristic affected 1 2 3 4 5 6 7 Isolates High Medium, High None None None None None Components High Medium, High None None None None None Clustering None None High High High High High For detecting degradation, the $$(\lambda_2,\tau_3,\tau_4)$$ test is shown to perform better than the $$\lambda_2$$ test especially for detecting degradation caused by node deletion of nodes with minimum and median degree. However, the $$\lambda_2$$ test performs comparably to the $$(\lambda_2,\tau_3,\tau_4)$$ test for detecting node deletion on high degree nodes except for specific network sizes. 6. Real networks application To test the usefulness of our proposed method for real-world application, we then applied the method on a convenient sampling of available real network data. The datasets summarized in Table 15 is comprised of networks available in the literature [33, 39] that are believed to be scale free from a variety of fields and also varies in terms of sizes. It should be noted that all networks were treated as undirected networks for the analysis. Each network was first characterized to the closest Barabási–Albert network proxy using the test on the L-moment of its degree distribution. Only the Karate Club (Karate) network was able to be characterized as a Barabási–Albert network with $$m=3$$ and $$k=6$$. However, when considering the networks of Political Books Co-purchases (Polbooks) and Co-authorship of Network Scientists (Netscience), the sub-networks of these networks were able to be characterized as a Barabási–Albert network. Here, sub-networks are portions of the networks where only nodes with degree greater than $$m$$ are considered and results in a structure of the hubs of the network. For the Netscience network, the sub-network where degree $$\geq 3$$ was able to be characterized as a Barabási–Albert network with $$m=3$$ and $$k=5$$. Similarly for the Polbooks network, the sub-network where the degree $$\geq 5$$ was able to be characterized as a Barabási–Albert network with $$m=5$$ and $$k=6$$. It should be noted that the actual size of the Netscience sub-network is $$N=796$$ ($$k=10$$) but the closest characterization is for $$k=5$$. However, we proceed with the $$m=3$$ and $$k=5$$ network proxy since the degree distribution of the sub-network is closely characterized by it. Table 15 Real world data description Network Brief description Type $$k:N=2^k$$ References Karate Club Social network of friendships Undirected 5.0875 [30] Dolphin Social Network Social network of dolphins Undirected 5.9542 [31] Les Miserables Coappearance of Les Miserables characters Undirected 6.2668 [32] Political Books Co-purchases of Political Books on Amazon.com Undirected 6.7142 [33] Political Blogs Hyperlinks between US politics weblogs Directed 10.5411 [34] Network Scientists Collaborations Coauthorship of Network Scientists Undirected 10.6339 [35] Facebook Social Circles Social network of friendships Undirected 11.9798 [36] High-Energy Theory Collaborations Coauthorships between scientists Undirected 13.0295 [37] Astrophysics Collaborations Coathorships between scientists Undirected 14.0281 [37] Internet Internet structure Undirected 14.4870 [33] Condensed Matter Collaborations Coauthorships between scientists Undirected 15.3028 [37] Google Webgraphs Network of hyperlinks between webpages Directed 19.7401 [38] Network Brief description Type $$k:N=2^k$$ References Karate Club Social network of friendships Undirected 5.0875 [30] Dolphin Social Network Social network of dolphins Undirected 5.9542 [31] Les Miserables Coappearance of Les Miserables characters Undirected 6.2668 [32] Political Books Co-purchases of Political Books on Amazon.com Undirected 6.7142 [33] Political Blogs Hyperlinks between US politics weblogs Directed 10.5411 [34] Network Scientists Collaborations Coauthorship of Network Scientists Undirected 10.6339 [35] Facebook Social Circles Social network of friendships Undirected 11.9798 [36] High-Energy Theory Collaborations Coauthorships between scientists Undirected 13.0295 [37] Astrophysics Collaborations Coathorships between scientists Undirected 14.0281 [37] Internet Internet structure Undirected 14.4870 [33] Condensed Matter Collaborations Coauthorships between scientists Undirected 15.3028 [37] Google Webgraphs Network of hyperlinks between webpages Directed 19.7401 [38] Following the characterization of the network and sub-networks, we then performed a degradation method similar to the one conducted on the simulated networks in Section 5 where $$1000$$ deletion sampling were performed at each degree level in order to investigate how well the method performs in characterizing and detecting degradation in a real-world network. Note that at this point we make the assumption that the Barabási–Albert network proxies and their L-moment properties are true representations of the characterized real-world network. Additionally, it should be noted that for the Karate and Netscience networks, the median degrees are equal to their respective $$m$$ values thus low and medium degrees deletion produced the same results and only low degree deletion is reported. Edge deletion on low degree nodes of the Karate network was not able to be detected until the proportion of deletion is at least $$6\%$$ with $$78.2\%$$ power (Fig. 12) which is comparable to the simulated result presented in Table 9 for that $$m$$ and $$k$$ combination. The power then peaks at $$94.7\%$$ when the proportion of deletion reaches $$15\%$$ but decreases as the proportion of deletion gets larger. Also, edge deletion on high degree nodes was not detected until the proportion of deletion is at least $$6\%$$ with $$100\%$$ power, and the power only decreases slightly to $$80.2\%$$ when the proportion of deletion is at $$20\%$$ which is considerably better than the simulation result for edge deletion on high degree nodes (Table 9). Detection of node deletion is not as good since node deletion on high degree nodes was not able to be detected, and for node deletion on low degree nodes did not occur until the proportion of deletion is at $$40\%$$ albeit with $$87.1\%$$ power. Despite, overall, low power to detect changes via node deletion, the results do reflect those obtained from the simulation for node deletion (Table 13). Fig. 12. View largeDownload slide Power vs $$p$$ for (a) low and (b) high degree edge deletion on the Karate network with $$m=3$$ and $$k=6$$. Black: Edge; Gray: Node. Fig. 12. View largeDownload slide Power vs $$p$$ for (a) low and (b) high degree edge deletion on the Karate network with $$m=3$$ and $$k=6$$. Black: Edge; Gray: Node. For the Netscience sub-network, Fig. 13 shows that detections were achieved at 4 and $$5\%$$ proportion of edge deletion on low and high degree nodes, respectively, which is comparable to the simulation result for low degree but considerably better for high degree (Table 9). $$100\%$$ power was also able to be maintained starting at 6 and $$7\%$$ proportion of edge deletion on low and high degree nodes, respectively, although the same power could not be obtained in the simulation. Similar results were obtained for node deletion where detection was achieved at 6 and $$8\%$$ proportion of node deletion on low and high degree nodes, respectively, and $$100\%$$ power maintained starting at $$15\%$$ proportion of node deletion on both low and high degree nodes (Fig. 13) which is noticeably better than the simulation result since the power was very low for the comparable $$m$$ and $$k$$ (Fig. 11 and Table 13). Fig. 13. View largeDownload slide Power vs $$p$$ for (a) low and (b) high degree edge deletion on the Netscience sub-network with $$m=3$$ and $$k=5$$. Black: Edge; Gray: Node Fig. 13. View largeDownload slide Power vs $$p$$ for (a) low and (b) high degree edge deletion on the Netscience sub-network with $$m=3$$ and $$k=5$$. Black: Edge; Gray: Node The results are not as promising for the Polbooks sub-network as they were for the other two applications. For edge deletion, Fig. 14 shows that although the detection rate is high when deleting edges on low degree nodes even for smaller proportion of deletion ($$92.4\%$$ power at $$3\%$$ proportion of deletion), the detection rate becomes worse when deleting edges on medium and high degree nodes where it requires as much as 20 and $$30\%$$ proportion of edge deletion on medium and high degree nodes, respectively, to achieve roughly $$60\%$$ power. However, this is consistent with the results from the simulation where power decreases as degree level increases (Fig. 8 and Table 9). On the other hand, although the result is consistent with simulation results, detection on node deletion was essentially nonexistent where the highest power achieved were 29.9, 72.4, and $$49.1\%$$ at $$50\%$$ proportion of node deletion on low, medium, and high degree, respectively (Fig. 14). Fig. 14. View largeDownload slide Power vs $$p$$ for (a) low, (b) medium, and (c) high degree edge deletion on the Polbooks sub-network with $$m=5$$ and $$k=6$$. Black: Edge; Gray: Node. Fig. 14. View largeDownload slide Power vs $$p$$ for (a) low, (b) medium, and (c) high degree edge deletion on the Polbooks sub-network with $$m=5$$ and $$k=6$$. Black: Edge; Gray: Node. In all of these examples, the test we developed was more sensitive to detecting network degradtation via edge deletion rather than node deletion. Further, sensitivity of the test was relatively high at a smaller proportion of deletion, even for low and medium degree nodes. Although we considered all possible data sets in Table 15, we only examined those that we could test and verify their structure as Barabási-Albert. It is possible, however, that our test would still be sensitive, but perhaps slightly less powerful, in detecting degradation in the other networks for which we could not classify as Barabási-Albert via their degree distribution. 7. Discussion Although the derived test in Section 4 is based on the empirical degree distribution of the network, the assumption that the Barabási–Albert degree distribution follows the $$Pareto(m,\beta)$$ distribution where $$\beta> 1$$ is still required. This is due to the fact that the mean of the distribution has to be defined in order for the second and other higher order L-moments to exist, but the mean is not defined if $$\beta\leq 1$$. The test on $$\lambda_2$$ was shown to have good power when used to test for a Barabási–Albert network. Extending the test into a multivariate test by adding the $$\tau_3$$ and $$\tau_4$$ improved the power significantly for cases where the network size is small. This initial step of truth classification is important because it is necessary for the test to be able to classify the network to the ground truth with high power. Additionally, these results demonstrate that the test acts not only as a test for network degradation, but also acts as a test for network growth as a function of the $$m$$ parameter (minimum degree) since the test is able to detect with high power if the network’s minimum degree has increased. Based on the sensitivity analysis, the test is shown to be more sensitive for detecting change from edge deletion than node deletion red and can detect a change in the network with deletion of as little as 1% of the edges. However, the addition of $$\tau_3$$ and $$\tau_4$$ to the test including $$\lambda_2$$ provides notable and significant sensitivity to change over that of $$\lambda_2$$ alone. The multivariate test for change detection using $$(\lambda_2,\tau_3,\tau_4)$$ is also more sensitive in edge deletion than in node deletion since it is able to detect the changes at a smaller proportion of deletion ($$p$$). In other words, the test is able to detect degradation caused by edge deletion much sooner than if it is caused by node deletion. Recall that, here, degradation is defined as removal of nodes or edges within the network that changes the structure of the network. Finally, the power of the test from Section 4.2 shows that the multivariate test is able to detect, with good power, if the minimum degree of the network has changed even when the minimum degree is not small. Our application of the multivariate test to real-world network data in Section 6 validates these results. In addition, our results are relatively consistent with respect to (generically) communication within the network. As others have found [1, 2], their studies and ours both demonstrate that the deletion of high degree nodes in larger networks reduces communication (via effects on diameter and clustering coefficient). We also found this result to be true with respect to edge deletion. Further, the use of our statistical test demonstrated that the degradation of the network could be detected with as little as about 10% of nodes. Although communication may not be affected with this level of (random) node deletion, early detection is paramount on networks of critical infrastructure. Our test provides a way to detect these changes in the network early on, even earlier when edges are deleted. This is especially critical during an attack in order to instill safeguards to try to preserve network performance. Our work also examined very small networks, which have previously had less attention in the literature. For node deletion, the overall performance of the multivariate test increases gradually as a function of $$p$$, but it is very apparent that the test is practically unusable for smaller networks when only the high degree nodes are affected. This is attributable to the fact that there are only a few number of nodes with high degree for smaller networks and if the proportion of deletion $$p$$ is small, then it is likely that none of the nodes are deleted (Table 16). We see that for $$p=0.01$$, there are no edges deleted when $$k=5$$. However, this is a function of our simulation and should not be expected in real-world applications where it is very unlikely that the probability of deletion is that low. Therefore, it would be interesting to study the performance of the test on smaller networks with a larger window for $$p$$. These results could also be explained when looking at the statistics on the deletion processes for the various conditions where, although edge deletion seems to create some isolates for almost all combinations of degree level, network size, and proportion of deletion, node deletion seems to have only created some isolates on medium and high degree on larger networks with high proportion of deletion. Thus, if the objective is to affect a network in such a way that it changes the network’s characteristic, then it is suggested that the focus be placed on the connectors (i.e. edges) within said network as oppose to the entities of the network (i.e. nodes). However, if the objective is to detect whether or not the network has degraded, then the multivariate test for change detection using $$(\lambda_2,\tau_3,\tau_4)$$ is able to detect the degradation caused by edge deletion with high power. Table 16 Nodes affected and edges deleted from node deletion Edges deleted $$95\%$$ CI $$m$$ $$k$$ Nodes affected $$95\%$$ CI $$(p=0.01)$$ $$(p=0.5)$$ 1 5 $$(1,3)$$ $$(0,0)$$ $$(0,8)$$ 14 $$(166,207)$$ $$(11,47)$$ $$(1362,1709)$$ 2 5 $$(1,2)$$ $$(0,0)$$ $$(0,12)$$ 14 $$(164,187)$$ $$(21,113)$$ $$(2749,3410)$$ 3 5 $$(1,2)$$ $$(0,0)$$ $$(0,15)$$ 14 $$(164,179)$$ $$(31,162)$$ $$(4159,5117)$$ 4 5 $$(1,3)$$ $$(0,0)$$ $$(0,18)$$ 14 $$(164,176)$$ $$(42,221)$$ $$(5556,6834)$$ 5 5 $$(1,3)$$ $$(0,0)$$ $$(0,20)$$ 14 $$(164,173)$$ $$(52,303)$$ $$(7054,8532)$$ 6 5 $$(1,3)$$ $$(0,0)$$ $$(0,22)$$ 14 $$(164,172)$$ $$(62,332)$$ $$(8440,10320)$$ 7 5 $$(1,3)$$ $$(0,0)$$ $$(0,23)$$ 14 $$(164,171)$$ $$(73,367)$$ $$(9938,12020)$$ Edges deleted $$95\%$$ CI $$m$$ $$k$$ Nodes affected $$95\%$$ CI $$(p=0.01)$$ $$(p=0.5)$$ 1 5 $$(1,3)$$ $$(0,0)$$ $$(0,8)$$ 14 $$(166,207)$$ $$(11,47)$$ $$(1362,1709)$$ 2 5 $$(1,2)$$ $$(0,0)$$ $$(0,12)$$ 14 $$(164,187)$$ $$(21,113)$$ $$(2749,3410)$$ 3 5 $$(1,2)$$ $$(0,0)$$ $$(0,15)$$ 14 $$(164,179)$$ $$(31,162)$$ $$(4159,5117)$$ 4 5 $$(1,3)$$ $$(0,0)$$ $$(0,18)$$ 14 $$(164,176)$$ $$(42,221)$$ $$(5556,6834)$$ 5 5 $$(1,3)$$ $$(0,0)$$ $$(0,20)$$ 14 $$(164,173)$$ $$(52,303)$$ $$(7054,8532)$$ 6 5 $$(1,3)$$ $$(0,0)$$ $$(0,22)$$ 14 $$(164,172)$$ $$(62,332)$$ $$(8440,10320)$$ 7 5 $$(1,3)$$ $$(0,0)$$ $$(0,23)$$ 14 $$(164,171)$$ $$(73,367)$$ $$(9938,12020)$$ The test on the L-moments is not constrained to the Barabási–Albert model since it is based on the empirical L-moments of the network in question, and although the Barabási–Albert network is the sole focus of this research, the method can be applied to other network models [40–42]. However, in order to apply the multivariate test to a real-world application, a method of characterizing the network in question to a suitable model proxy, here a Barabási–Albert network, is required. Once this is established, then the properties associated with the model proxy, such as the scale-free or small-world properties, can be directly linked to the real network in question and the network can be monitored using the proposed test on L-moments. Finally, the test on L-moments is not restricted to the degree distribution of the network. There are other nodal measures available in the literature not considered here [43]. Thus, the same approach as in this article can be used to develop a test on these other measures to provide additional insight into the changes within the network. 8. Conclusion We developed a test to detect degradation within networks that can be characterized as a Barabási–Albert network. Such degradation detection is important before network performance, in the context of the network application, is affected. We have shown that our test is fairly sensitive to changes, especially removal of edges, since it is able to detect the change with as low as 1% of the edges affected. The performance was also duplicated when the test was applied to real-world networks with structures characterized as Barabási–Albert which shows the feasibility of the usefulness of the test. Further, the utility of our test shows its usefulness for future application of fitting a network model proxy to an empirical network to aid the network analyst in monitoring networks through visualization. References 1. Albert R., Jeong H. & Barabási A.-L. ( 2000) Error and attack tolerance of complex networks. Nature , 406, 378– 382. Google Scholar CrossRef Search ADS 2. Callaway D. S., Newman M. E., Strogatz S. H. & Watts D. J. ( 2000) Network robustness and fragility: percolation on random graphs. Phys. Rev. Lett. , 85, 5468. Google Scholar CrossRef Search ADS 3. Holme P., Kim B. J., Yoon C. N. & Han S. K. ( 2002) Attack vulnerability of complex networks. Phys. Rev. E , 65, 056109. Google Scholar CrossRef Search ADS 4. Iyer S., Killingback T., Sundaram B. & Wang Z. ( 2013) Attack robustness and centrality of complex networks. PLoS One , 8, e59613. Google Scholar CrossRef Search ADS 5. Radicchi F. ( 2015) Percolation in real interdependent networks. Nat. Phys. , 11, 597– 602. Google Scholar CrossRef Search ADS 6. Sun S., Wu Y., Ma Y., Wang L., Gao Z. & Xia C. ( 2016) Impact of degree heterogeneity on attack vulnerability of interdependent networks. Sci. Rep. , 6, 32983. Google Scholar CrossRef Search ADS 7. Albert R., Jeong H. & Barabási A.-L. ( 1999) Internet: diameter of the world-wide web. Nature , 401, 130– 131. Google Scholar CrossRef Search ADS 8. Barabási A.-L. & Albert R. ( 1999) Emergence of scaling in random networks. Science , 286, 509– 512. Google Scholar CrossRef Search ADS 9. Clauset A., Shalizi C. R. & Newman M. E. ( 2009) Power-law distributions in empirical data. SIAM Rev. , 51, 661– 703. Google Scholar CrossRef Search ADS 10. Newman M. E. ( 2005) Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. , 46, 323– 351. Google Scholar CrossRef Search ADS 11. Zhao Z.-D., Yang Z., Zhang Z., Zhou T., Huang Z.-G. & Lai Y.-C. ( 2013) Emergence of scaling in human-interest dynamics. Sci. Rep. , 3, 3472. Google Scholar CrossRef Search ADS 12. Fienberg S. E. ( 2012) A brief history of statistical models for network analysis and open challenges. J. Comput. Graph. Stat. , 21, 825– 839. Google Scholar CrossRef Search ADS 13. Moonesinghe H., Valizadegan H., Fodeh S. & Tan P.-N. ( 2007) A probabilistic substructure-based approach for graph classification. 19th IEEE International Conference on Tools with Artificial Intelligence, 2007. ICTAI 2007 , Vol. 1. Los Alamitos, CA: IEEE Computer Society, pp. 346– 349. 14. Mowshowitz A. & Dehmer M. ( 2012) Entropy and the complexity of graphs revisited. Entropy , 14, 559– 570. Google Scholar CrossRef Search ADS 15. Pareto V. & Busino G. ( 1965) Ecrits sur la courbe de la repartition de la richesse: reunis et presentes par G.Busino (Originally published in 1896) , Travaux de driot, d’économie, de sociologie et de sciences politiques. Geneve: Droz. 16. Hosking J. R. ( 1990) L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B , 52, 105– 124. 17. Gini C. ( 1912) Variabilitá e mutabilitá, contributo allo studio delle distribuzioni e delle relazione statistiche. Stud. Econ.-Giur. R. Univ. Cagl. , 3, 3– 159. 18. Sillitto G. P. ( 1951) Interrelations between certain linear systematic statistics of samples from any continuous population. Biometrika , 56, 377– 382. Google Scholar CrossRef Search ADS 19. Sillitto G. P. ( 1969) Derivation of approximants to the inverse distribution function of a continuous univariate population from the order statistics of a sample. Biometrika , 56, 641– 650. Google Scholar CrossRef Search ADS 20. Downton F. ( 1966) Linear estimates with polynomial coefficients. Biometrika , 53, 129– 141. Google Scholar CrossRef Search ADS 21. Chan L. K. ( 1967) On a characterization of distributions by expected values of extreme order statistics. Amer. Math. Monthly , 74, 950– 951. Google Scholar CrossRef Search ADS 22. Konheim A. ( 1971) A note on order statistics. Amer. Math. Monthly , 78, 524– 524. Google Scholar CrossRef Search ADS 23. Mallows C. ( 1973) Bounds on distribution functions in terms of expectations of order-statistics. Ann Probability , 1, 297– 303. Google Scholar CrossRef Search ADS 24. Greenwood J. A., Landwehr J. M., Matalas N. C. & Wallis J. R. ( 1979) Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Water Resour. Res. , 15, 1049– 1054. Google Scholar CrossRef Search ADS 25. Wang Q. ( 1996) Direct sample estimators of L-moments. Water Resour. Res. , 32, 3617– 3619. Google Scholar CrossRef Search ADS 26. Hosking J. R. ( 1986) The theory of probability weighted moments. Discussion Paper Research Report RC12210 . Yorktown Heights, N.Y.: IBM Research Division. 27. Elamir E. A. & Seheult A. H. ( 2004) Exact variance structure of sample L-moments. J. Stat. Plan. Inference , 124, 337– 359. Google Scholar CrossRef Search ADS 28. Royston J. P. ( 1983) Some techniques for assessing multivarate normality based on the Shapiro-Wilk W. J. R. Stat. Soc. Ser. C , 32, 121– 133. 29. Mohd-Zaid F., Schubert Kabban C. M., Deckro R. F. & White E. D. ( 2017) Parameter specification for the degree distribution of simulated Barabási–Albert graphs. Phys. , 465, 141– 152. 30. Zachary W. ( 1977) An information flow model for conflict and fission in small groups. J. Anthropol. Res. , 33, 452– 473. Google Scholar CrossRef Search ADS 31. Lusseau D., Schneider K., Boisseau O. J., Haase P., Slooten E. & Dawson S. M. ( 2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. , 54, 396– 405. Google Scholar CrossRef Search ADS 32. Knuth D. E. ( 1993) The Stanford GraphBase: A Platform for Combinatorial Computing . Pub-ACM. 33. Newman M. E. ( 2013) Network data. http://www-personal.umich.edu/ mejn/netdata/ (accessed on 7 May 2016). 34. Adamic L. A. & Glance N. ( 2005) The political blogosphere and the 2004 U.S. Election: divided they blog. Proceedings of the 3rd International Workshop on Link Discovery, LinkKDD ’05 . New York, NY, USA: ACM, pp. 36– 43. 35. Newman M. E. ( 2006) Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E , 74, 036104. Google Scholar CrossRef Search ADS 36. McAuley J. J. & Leskovec J. ( 2014) Discovering Social Circles in Ego Networks. ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue . Vol. 8, New York, NY, USA: ACM, pp. 4:1– 4:28. Google Scholar CrossRef Search ADS 37. Newman M. E. ( 2001) The structure of scientific collaboration networks. Proc. Nat. Acad. Sci. , 98, 404– 409. Google Scholar CrossRef Search ADS 38. Leskovec J., Lang K. J., Dasgupta A. & Mahoney M. W. ( 2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. , 6, 29– 123. Google Scholar CrossRef Search ADS 39. Leskovec J. & Krevl A. ( 2014) SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data (accessed on 9 December 2015). 40. Morris J. F., ONeal J. W. & Deckro R. F. ( 2014) A random graph generation algorithm for the analysis of social networks. J. Defense Modeling and Simulation: Applications, Methodology, Technology , 11, 265– 276. Google Scholar CrossRef Search ADS 41. Small M., Judd K. & Zhang L. ( 2014) How is that complex network complex? IEEE International Symposium on Circuits and Systems (ISCAS), 2014 , Los Alamitos, CA: IEEE Computer Society, pp. 1263– 1266. 42. Watts D. J. & Strogatz S. H. ( 1998) Collective dynamics of ‘small-world’ networks. Nature , 393, 440– 442. Google Scholar CrossRef Search ADS 43. Wasserman S. & Faust K. ( 1994) Social Network Analysis: Methods and Applications . Cambridge University Press. Google Scholar CrossRef Search ADS Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

Journal of Complex Networks – Oxford University Press

**Published: ** Feb 1, 2018

Loading...

personal research library

It’s your single place to instantly

**discover** and **read** the research

that matters to you.

Enjoy **affordable access** to

over 18 million articles from more than

**15,000 peer-reviewed journals**.

All for just $49/month

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Read from thousands of the leading scholarly journals from *SpringerNature*, *Elsevier*, *Wiley-Blackwell*, *Oxford University Press* and more.

All the latest content is available, no embargo periods.

## “Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

## “Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

## “I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

## “My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

@JoseServera

DeepDyve ## Freelancer | DeepDyve ## Pro | |
---|---|---|

Price | FREE | $49/month |

Save searches from | ||

Create lists to | ||

Export lists, citations | ||

Read DeepDyve articles | Abstract access only | Unlimited access to over |

20 pages / month | ||

PDF Discount | 20% off | |

Read and print from thousands of top scholarly journals.

System error. Please try again!

or

By signing up, you agree to DeepDyve’s Terms of Service and Privacy Policy.

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.