TY - JOUR AB - Abstract Distributed Denial-of-Service (DDoS) is one of the most destructive network attacks. In Socially Aware Networking (SAN), there are many problems in current detection methods, such as low flexibility in detecting different attacks, high false-negative and false-positive rates. In this paper, we propose a DDoS detection method for SAN based on fusion feature series forecasting. Specifically, we define a multi-protocol-fusion feature (MPFF) to characterize normal network flows. Moreover, we utilize the time-series Autoregressive Integrated Moving Average Model (ARIMA) to formally describe the MPFF sequence, which is subsequently used in network flow forecasting and error calculation. Finally, we present the ARIMA detection model with error correction based on MPFF time series to identify DDoS in SAN. The experimental results show that the proposed method can effectively distinguish attacking flows from normal ones. Compared with previous DDoS detection methods for SAN, the proposed method can achieve better performance of detecting DDoS in terms of detection rate, false-positive rate and time delay. 1. INTRODUCTION In recent years, with the rapid development of the Internet and the increasing popularity of intelligent terminals, socially aware networking (SAN) has become a frontier research field of various disciplines [1–5]. The survivability of the future Internet largely depends on whether it will be able to successfully address both security and performance issues [6]. Distributed denial-of-service (DDoS) is an attack derived from denial-of-service attacks on the Internet [7]. An attacker uses a client or server technology to combine multiple computers as a platform to attack one or more targets, thereby enhancing the power of the attack [8]. Devastating, widespread, frequent and increasingly complex DDoSs have become one of the most important threats to the Internet today [9]. To reduce the harm of DDoSs, academics and industry have proposed many methods to detect them. With the popularity and popularity of SAN, people pay more and more attention to the individual behavior of social-aware networking users [10]. Modern information technology can extract useful information from the social network of individual behavior [11], and even through community discovery algorithm to distinguish overlapping complex social networks to provide accurate network service [12]. Therefore, when people enjoy the cloud to store individual behavior information [13], they need to verify whether the cloud service providers keep their data securely [14]. However, the continuous development of network technology makes the DDoS attack method more complex and diverse in the context of social-aware networking user behavior, making it more destructive and detecting very difficultly multiple kinds of DDoS based on a single characteristic of network flows. SAN user’s relationship is shown in Fig. 1. The different nodes in the graph represent different users. The whole network represents the relationship among different users. The one-to-many relationship in users’ relationship of SAN is similar to DDoS. Also, a multi-feature DDoS detection method has certain hysteresis due to the complexity of the algorithm. Identifying DDoS early and accurately is even more difficult. Figure 1. View largeDownload slide User relationship diagram in SAN. Figure 1. View largeDownload slide User relationship diagram in SAN. To overcome the above problems, this paper analyzes traffic characteristics in normal and attack flows under the three common protocols (TCP, UDP and ICMP). Accordingly, we define a multi-protocol-fusion feature (MPFF) based on common characteristics of DDoS, such as suddenness and asymmetry, the distributed feature of source IP addresses, and the concentrated feature of destination IP addresses. Then, we propose a DDoS detection method based on MPFF. Specifically, the proposed method uses an ARIMA model with time-series forecasting to model the MPFF sequence. Based on this model, both dynamic and static forecasting methods are used to forecast the measured flow. After that, we use Fourier series to reduce forecasting errors. Finally, we determine the optimal normal flow threshold to facilitate DDoS detection. The experimental results show that the proposed method can effectively distinguish attacking flows from normal ones. Compared with previous DDoS detection methods for SAN, the proposed method can achieve better performance of detecting DDoS in terms of detection rate, false-positive rate and time delay. The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 analyzes the characteristics of DDoS. Section 4 gives the details of the MPFF feature. Section 5 proposes the proposed DDoS detection method based on MPFF. Section 6 presents comparative results to evaluate the performance of the proposed method. Finally, Section 7 gives some conclusion remarks. 2. RELATED WORK In recent years, the technology of Internet of Things has become more and more important, which has brought great convenience for people’s life and urban development [15]. At the same time, attackers use the Internet of Things and social networks to launch DDoSs. Network intrusion detection is one of the most important parts for cyber security to protect computer systems against malicious attacks [16]. Since machine-learning method can effectively improve the accuracy of attack recognition [17, 18], more and more researchers recognize DDoSs based on machine learning [19]. Since most of these attacks are launched abruptly and severely [20], a fast intrusion prevention system is desirable to detect and mitigate these attacks as soon as possible [21]. Existing detection methods can be divided into two categories: those based on statistics and those based on classification. 2.1. Detection methods based on statistics DDoS detection methods based on statistics identify changes in traffic characteristics caused by DDoSs or changes of packet information structure, and set a threshold to define whether an attack exists. Its research includes the study of entropy, the Hurst parameter and flow matrices. David et al. [22] proposed a fast entropy method based on flow analysis. Zhang Dong-mei et al. [23] proposed an attack-detection method based on self-similarity of networks. Sang Min Lee et al. [24] used a genetic algorithm (GA) to optimize the parameters of the flow matrix, greatly improving the detection rate. 2.2. Detection methods of DDoS based on classification DDoS-detection based on classification extracts the featured sample series from the network flow and uses a machine-learning algorithm to study the training samples. A classifier model is then established to classify the samples to be tested and distinguish normal flow from attack flow to detect DDoS. The main research is focused on support vector machines (SVMs), naïve Bayes (NB) algorithms and decision trees. Karnwal et al. [25] turned a 1D time series into a multi-dimensional time series for an accuracy-rate (AR) model parameter, and used a support vector machine to study and classify the data stream. Tama et al. [26] modeled the network data stream according to the header attribute, and scored each incoming data stream using an NB algorithm. Rabia Latif et al. [27] proposed an enhanced decision-tree algorithm that can effectively detect the occurrence of DDoS in cloud-assisted Wireless Body Area Network (WBAN). Compared with some traditional detection methods for DDoSs, time-series forecasting has the advantages of high data dimension, real order and self-correlation between data [28]. Time-series forecasting methods have been used in many fields in recent years [29]. Pertaining to time series in DDoS detection, Nezhad et al. [30] proposed a detection algorithm of DoS and DDoSs using an ARIMA time-series model and chaotic system in a computer network. However, this method can only detect a single type of DDoS. Andrysiak et al. [31] used conditional heteroscedastic methods to optimize the parameters of a time-series model to detect DDoSs, but the algorithm was complex and the detection results showed a certain hysteresis. The MPFF feature proposed in this paper can solve the problems of a time-series forecasting method in DDoS detection. In this paper, the traffic characteristics of the three protocols are extracted according to different characteristics of the network flow under different protocols, and then the traffic characteristics of these three protocols are fused according to the MPFF algorithm. Compared with detection based on a single characteristic of network flows, the detection method based on MPFF can fully utilize the message information and improve the rate of detection of DDoSs. Compared with a multi-feature detection method, a method based on MPFF has a simpler algorithm and is more effective. We improve the real-time performance of MPFF time-series modeling for DDoS attack detection in this paper by using an ARIMA model to combine both dynamic and static forecasting methods. To improve the effectiveness and robustness of DDoS detection, we propose an ARIMA model with error correction by Fourier series based on MPFF time series. 3. THE CHARACTERISTICS OF DDOS IN SAN The DDoS dataset published by Lincoln Lab [32] and the DARPA dataset issued by CAIDA [33] indicate that the flow of the typical DDoS has characteristics such as flow burst, distributive source IP address and asymmetric flow. The flow asymmetry has a many-to-one relationship, with the following specific performance: The source IP address and destination IP address of the network flow have a many-to-one relationship. According to the burstiness of attack flow and the distribution of the source IP addresses, when a DDoS occurs, many puppet machines are controlled to attack the victim simultaneously. The attacker can forge the source IP address of the attack packet continuously or randomly, making the distribution of source IP addresses more scattered and creating many source IP addresses. The source IP address of network flow and the destination port have a many-to-one relationship. An attacker of a service on the target host sends a large amount of data to a port on the target host. The destination port and destination IP address of the network flow have a many-to-one relationship. The number of destination ports is relatively stable in the normal state of a network. However, when a DDoS occurs, the attacker requests multiple services from the target machine and randomly generates different port numbers. Because the attacker can initiate one or several of the techniques of TCP flood, UDP flood and ICMP flood to avoid detection, this paper will analyze the traffic characteristics of TCP, UDP and ICMP to combine the traffic characteristics of three types of protocol and design the traffic characteristics of MPFF: The first step in effectively analyzing the characteristics of the TCP protocol state and accurately distinguishing DDoSs and flash-crowd events [34] is to exclude the interference of normal flow on feature extraction. Then, according to the burstiness of traffic and dispersive distribution of source IPs, the statistics of the source IP address of the TCP protocol packet, destination IP address and destination port number are obtained to calculate the characteristics of the TCP protocol flow. When UDP-based DDoSs occur, the number of ICMP packets that cannot reach the destination port increases rapidly. The abnormal change of this parameter reflects the occurrence of DDoSs. Thus, this can be used as the basis for attack determination [35]. For a DDoS-detection method based on the ICMP protocol, the request and response should have the same order of magnitude in the normal state [36]. If there are many requests to the destination IP address but the response is small, the occurrence of ICMP-based DDoSs can be determined. 4. DETAILS OF THE MULTI-PROTOCOL-FUSION FEATURE According to the above analysis regarding the characteristics of DDoS under three types of protocols, we define a multi-protocol-fusion feature (MPFF) to comprehensively describe DDoS behaviors based on TCP, UDP and ICMP. The three types of characteristic parameters are extracted and then fused. 4.1. Definition of MPFF features Definition 1 Within a certain time ΔT, assume the IP network flow NF (net flow) <(t1,S1,D1,P1,N1,C1,R1),(t2,S2,D2,P2,N2,C2,R2),…,(tn,Sn,Dn,Pn,Nn,Cn,Rn)>, where i=1,2,…,n; ti represents the time of the ith packet; Si and Di represent the source and destination IP addresses of the ith packet, respectively. Pi represents the protocol, Ni refers to the ICMP type field, Ci is the ICMP code field and Ri means the destination port number of the ith packet. The packets with the same source IP address and destination IP address are in the same class, i.e. all the packets whose source IP address is Ai and destination IP address is Aj are SD(Ai,Aj). When a DDoS occurs, the source IP address and destination IP address of the network flow have a many-to-one relationship. If the same source IP address Ai sends packets to multiple destination IP addresses Aj and Ak, i.e. SD(Ai,Aj) and SD(Ai,Ak) are not empty, then the source IP address has a one-to-many access mode, it can be determined as a non-attack flow, and we can delete all data packets of the source IP address. Similarly, if the destination IP address Aj receives a message from one and only one source IP address, such that packets with the destination IP address of Aj only form one class, SD(Ai,Aj), then the destination IP address has a one-to-one access mode, which indicates that it is normal traffic and all packets of the destination IP address can be deleted. Definition 2 After deleting the classes with one-to-one or one-to-many access mode, the data packets of the IP network flow NF with the same destination IP address are classified in the same class, i.e. all packets with the same destination address are SDR1, SDR2, SDR3,…,SDRk. Among them, the class formed by TCP packets that have Aj as a destination IP address is SDRk(Aj). Definition 3 The network flow NF is sampled with a time interval of Δt, and the traffic characteristics of network NF based on TCP protocol are TNF=1k(∑i=1kW(SDRi)−k), (1) where W(SDRi)=αNum(SDRi)+β∑j=1Num(SDRi)OverA(Packeti(Aj))+(1−α−β)OverB(Port(SDRi)), 0≤α≤1, 0≤β≤1. Num(SDRi) is the number of packets with different source IP addresses in SDRi, and k is the number of SDR classes: OverA(Packeti(Aj))={Packeti(Aj)−θ1,Packeti(Aj)/Δt≥θ10,Packeti(Aj)/Δt<θ1, (2) where Aj represents the source IP address, and Packeti(Aj) is the number of packets whose source IP address is Aj in class SDRi: OverB(Port(SDRi))={Port(SDRi)−θ2,Port(SDRi)/Δt≥θ20,Port(SDRi)/Δt<θ2, (3) where Δt is the sampling time interval; θ1 and θ2 are thresholds; and Port(SDRi) is the number of different destination port numbers in class SDRi. In the definition of TNF, the network traffic packets per time unit are classified according to the source IP address and the destination IP address, and then the influence of the normal network flow on the TNF result is excluded according to the DDoS characteristics. The attacker will send unwanted packets to the victim host with many false source IP addresses, so the value of Num(SDRi) also increases when a DDoS occurs. In addition, according to the characteristics of a DDoS, when multiple attacking source IPs send unwanted packets to a destination IP address, the number of packets in class SDRi having Aj as source IP address per time unit, Packeti(Aj), increases. When unused packets are sent from an attacking source IP to a destination port on a destination host in a unit of time, the number of different destination port numbers Port(SDRi) in class SDRi will also increase abnormally. The three parameters Num(SDRi), Packet(Aj) and Port(SDRi) are calculated with different weights, and the weighted values of various types of SDRi in the network flow NF per time unit are summed to obtain the characteristic value of the flow TNF under TCP protocol. In this paper, UNF is used to characterize the traffic characteristics of the UDP protocol. UNF is the total number of ICMP packets that cannot be reached within time interval Δt. The DDoS characteristic under ICMP protocol is analyzed, and the traffic characteristic of the ICMP protocol is set to INF, i.e. the absolute value of the quantity difference between type 0 response in ICMP packet and type eight request, RE = |REPLY-REQUEST|. 4.2. Feature fusion TNF, UNF and INF describe the characteristics of the network flow from the perspectives of TCP, UDP and ICMP protocols, respectively, and can detect only one type of DDoS. To improve the sensitivity of detection, an effective MPFF traffic characteristic is formed by combining the traffic characteristics of three types of protocol and the characteristics of three types of attacks. That is, the traffic features of three kinds of protocol are extracted, and the three characteristic parameters are fused and then used for ARIMA modeling and DDoS detection. If one of the detection methods indicates an attack, then the existence of the attack is determined. Define a multi-protocol-fusion feature as MPFF=φ⋅TNF+ρ⋅UNF+ω⋅INF,0≤φ≤1,0≤ρ≤1,0≤ω≤1, (4) where φ, ρ and ω are the sensitivity coefficients of the three characteristic parameters toward a network anomaly. The sensitivity coefficient is set larger for characteristic parameters that can better reflect the difference between normal flow and abnormal flow. MPFF can only reflect current network flow changes. In this paper, the ARIMA model is established using a set of MPFF values that have an upward trend at the time of a DDoS. The ARIMA model is used to forecast the future MPFF value. Finally, according to the forecast value, whether the current MPFF trend shows the possibility of a DDoS is determined. 5. DDOS DETECTION METHOD BASED ON MPFF 5.1. MPFF time-series modeling The data at the beginning of the attack were sampled at a time interval of Δt and the MPFF value for each sample was calculated. After N times of sampling, the MPFF time-series sample was obtained as MPFFi,i=1,2,…,N shown in Fig. 2. Figure 2. View largeDownload slide MPFF time series. Figure 2. View largeDownload slide MPFF time series. The original MPFF time series were modeled in the Eviews 8.0 environment. We see from the original data sequence of Fig. 2 that the sequence is nonstationary and has an obvious temporal trend. So, we performed the Automatic Direction Finder (ADF) stationary test containing a constant and time trend for this time series. From the test results, we see that the P value of the original sequence is more than 5%. To slow down the trend of the sequence, we need the first-order differential treatment of the original sequence and the ADF unit root test on the treated sequence. The test results are shown in Table 1. Table 1. ADF stationarity test. stat cValue P ADF −17.158 −1.9411 0.001 White Noise 21.6059 12.5916 0.0014269 stat cValue P ADF −17.158 −1.9411 0.001 White Noise 21.6059 12.5916 0.0014269 Table 1. ADF stationarity test. stat cValue P ADF −17.158 −1.9411 0.001 White Noise 21.6059 12.5916 0.0014269 stat cValue P ADF −17.158 −1.9411 0.001 White Noise 21.6059 12.5916 0.0014269 The test results show that the series becomes stable after differential treatment. Since the probability P of the unit root test is less than 0.05, the original hypothesis can be rejected at the 5% significance level, indicating that the autocorrelation coefficient of at least one lag is significant. That is, the original hypothesis is rejected, the sequence is white noise, and the original series is a first-order single integral sequence, meaning the difference order d=1. According to the result, further examination of the autoregressive and partial correlation graphs of the post-differential sequence shown in Fig. 3 can be roughly determined as 0≤p≤1, 0≤q≤1. In the range of the value, the optimal model, ARIMA (1,1,1), was obtained by referring to the Akaike information criterion (AIC). Figure 3. View largeDownload slide Autocorrelation and partial correlation of MPFF sequence. Figure 3. View largeDownload slide Autocorrelation and partial correlation of MPFF sequence. The parameter verification and estimation results of the ARIMA (1,1,1) model using the least squares method are shown in Table 2. The P values were significant at 1%. Table 2. Parameter estimation and test. Variable Coefficient Std. error t-Statistic Prob. AR{1} 0.997898 0.00829674 120.276 0.000 MA{1} 0.358497 0.0340011 10.5437 0.000 Variable Coefficient Std. error t-Statistic Prob. AR{1} 0.997898 0.00829674 120.276 0.000 MA{1} 0.358497 0.0340011 10.5437 0.000 Table 2. Parameter estimation and test. Variable Coefficient Std. error t-Statistic Prob. AR{1} 0.997898 0.00829674 120.276 0.000 MA{1} 0.358497 0.0340011 10.5437 0.000 Variable Coefficient Std. error t-Statistic Prob. AR{1} 0.997898 0.00829674 120.276 0.000 MA{1} 0.358497 0.0340011 10.5437 0.000 To further verify the rationality of the ARIMA model, Eviews 8.0 was used to perform a white-noise test on the forecast residuals of the model. The test results output the residual sequence diagram (Fig. 4) and the ACF chart (Fig. 5). Figure 4. View largeDownload slide Diagram of residual time series. Figure 4. View largeDownload slide Diagram of residual time series. Figure 5. View largeDownload slide Residual autocorrelation and partial correlation. Figure 5. View largeDownload slide Residual autocorrelation and partial correlation. In the observation, the P probability values of the residuals are far greater than 0.1 in the autocorrelation and partial autocorrelation graphs. According to the hypothetical principle, at the 10% significance level, the original hypothesis ‘this sequence is a white-noise sequence’ is accepted. It can be concluded that the residual sequence estimated from the ARIMA model is a purely random sequence and the model is well fitted. In summary, the ARIMA (1,1,1) model was built. 5.2. Combined static and dynamic forecasting based on the ARIMA model Let the sequence of the obtained sample data be x1,x2,…,xn. Let t represent the subscript of the current forecast value. When t≤n, the static forecasting method is used, which is to directly use the previous sample data to forecast the next sample. When t>n, since there are n sample data points, the sample xn+1 can be forecast using the static forecasting method. However, starting from the sample xn+2, forecasting will need the value of xn+1, and even more sample values after that. For example, when forecasting the value of xn+2, the sample data to be used are x1,x2,…,xn,xn+1, where xn+1 is a forecast value rather than a true value, and when forecasting xn+3, the sample data to be used are x1,x2,…,xn,xn+1,xn+2, where xn+1andxn+2 are forecast values, and so on. If we use a static prediction method, we can only predict one value at a time. By using the kinetic and static combination model to predict, we can get a sequence of predicted values by sliding window. This way, we can get a better fitting result. 5.3. Fourier error correction The dynamic forecasting algorithm based on the ARIMA model has the problem of error accumulation when the number of forecast samples is too large. Therefore, in dynamic forecasting, the forecast error will increase with the number of samples, resulting in poor fitting in the subsequent dynamic forecasting. The Fourier series is a periodic function that can extract periodic information of the sample data sequence. The error of the forecasting sample can be forecast according to the Fourier series of the training sample error sequence. The forecast data are thus obtained, and are used to forecast the next data through dynamic forecasting, to reduce the error. In addition, there are some similar methods that can be used for error correction in this paper, such as wavelet analysis. Wavelet analysis is based on the Fourier transform, but its processing steps are more complex than the Fourier transform. Therefore, for the error correction part of this paper, the ideal solution is still Fourier transform. To compare the forecasting effect before and after error correction, we used the average value of the forecasting error to compare the magnitude of the error before and after. The average error and the error of the previous forecast values are ω=1l∑i=1l|Yi−Yi¯|Yi,1≤l≤N (5) W=∑i=1l|Yi−Yi¯|,1≤l≤N, (6) where N is the total number of forecast samples. Let M be the split threshold. When the cumulative error of the forecast values W exceeds the threshold value M, the cumulative error of the ARIMA model dynamic forecasting algorithm has great influence. Let T be the total number of samples to be corrected. Then Fourier series corrections for post-T (T=N−l) forecasting are required. The cumulative error of the first l forecasts were calculated, where l=1,2,…,N. The intervals are shown in Fig. 5. Moreover, the cumulative value of the error increases along with an increase of the total number of forecasting samples, as illustrated in Fig. 6. We can see in Fig. 6 that the slope increases after the furthest point of the diagonal. Thus, the furthest point within the interval from the diagonal is selected as a threshold. Figure 6. View largeDownload slide Cumulative error of forecast value. Figure 6. View largeDownload slide Cumulative error of forecast value. We calculate the difference of the actual data Xk and corresponding forecast data Yk, which is, ΔZk=Yk−Xk,k=1,2,…,N. Then we have one set of error data as ΔZ1,ΔZ2,…,ΔZN. When ΔZk>0, the forecast value is too large relative to the actual value, and vice versa. The Fourier series is obtained by using the calculated error value, and the calculated Fourier series is used to forecast the error of the subsequent forecasting values ΔZ1*,ΔZ2*,…,ΔZT*. By adding the forecast value and the forecast error value, a more accurate forecasting value is obtained to correct the forecast value. The error data sequence is {ΔZ1*,ΔZ2*,…,ΔZT*}, and the Fourier series is approximately expressed as ΔZk*=a02+∑n=1N(ancos(n⋅k⋅2πT)+bnsin(n⋅k⋅2πT)),1≤n≤N,k=1,2,…,T. (7) where a0=2T∑h=1NΔZh (8) a1=2T∑h=1N(ΔZhcos(h⋅2πT)),a2=2T∑h=1N(ΔZhcos(2⋅h⋅2πT)),…,aT=2T∑h=1N(ΔZhcos(T⋅h⋅2πT)) (9) b1=2T∑h=1N(ΔZhsin(h⋅2πT)),b2=2T∑h=1N(ΔZhsin(2⋅h⋅2πT)),…,bT=2T∑h=1N(ΔZhcos(T⋅h⋅2πT)) (10) where T is the number of forecast samples to be corrected. Substituting the value of the error data sequence into formulas (8–10), we can obtain the values of a0,a1,a2,…,aT,b1,b2,…,bT. Then the Fourier series of the error sequence is obtained. Using the Fourier series (7) of the error sequence, we can forecast the error of the T data points to be ΔZ1*,ΔZ2*,…,ΔZT*. We add the forecast error sequence ΔZ1*,ΔZ2*,…,ΔZT* to the corresponding forecast data sequence Yl+1,Yl+2,…,YN, to obtain T corrected forecast data points. 5.4. DDoS identification A key point of detecting DDoS by using the MPFF feature is to determine whether or not the MPFF value of each time sample point is abnormal. However, due to normal congestion in the LAN, a large threshold value may lead to delay or omission of a DDoS alarm, while a small one may lead to a false alarm. Due to the need to detect three types of DDoSs, this paper analyzes the MPFF interval of normal and abnormal flows, and obtains the MPFF interval of normal flow and abnormal flow as a,b and c,d, respectively. The threshold is to U=(b+c)/2. When the MPFF value of a time sample point exceeds the preset threshold value, the traffic is considered abnormal. The anomaly-detection method can eliminate the influence of network noise and normal network congestion on the forecasting results; this is an optimization supplement to the time-series forecasting model. 6. PERFORMANCE EVALUATION 6.1. Experimental data sets and evaluation criteria The DDoS 2007 dataset is selected as the experimental dataset in this paper [24]. We first define the detection rate, false alarm rate and total error rate. The detection rate (DR) is the probability that an actual attack can be detected. It is defined as DR=TNTN+FN, (11) where TN is the number of correctly identified attack samples, and FN is the number of misidentified attack samples. Then the false-negative rate is FN/(TN+FN). The false alarm rate FR is the probability that normal user behavior is misjudged as an attack. It is defined as FR=FPTP+FP, (12) where FP is the number of normal samples misidentified as attacks, and TP is the number of correctly identified samples. The total error rate ER is the probability that the user behavior is misjudged. It is a comprehensive reflection of the detection rate and false alarm rate. It is defined as ER=FN+FPTP+FP+TN+FN. (13) In contrast, (1−ER) is the accuracy-rate AR. The experiment in this paper is divided into four parts. The first is to extract the MPFF feature of the original sequence, use the established ARIMA model to forecast, correct the forecasting error through the Fourier series, and detect DDoSs. The results of this paper are compared with c-SVR regression forecasting and the SMPM algorithm [37] to verify that the proposed DDoS-detection method based on MPFF time series has better sequence modeling and capability for DDoS recognition. Eviews 8.0 is used to implement the ARIMA model and detect DDoSs in the MATLAB 2012a environment. The LIBSVM [38] toolkit is used to complete the SVR-related experiments. 6.2. Experimental results Sampling was performed according to the MPFF feature fusion algorithm at a time interval of 1 s, and the MPFF value is calculated according to the MPFF feature fusion algorithm. According to statistical analysis, when φ, ρ and ω are 1, 0.0001 and 0.0001, respectively, the normal flow of MPFF can be well separated from the abnormal flow area. The weights of MPFF time series are α=β=0.35, θ1=θ2=1. The comparison of the MPFF time series of normal and abnormal flow calculated by MATLAB is shown in Fig. 7. Figure 7. View largeDownload slide Comparison of MPFF time series of normal flow and attack flow. Figure 7. View largeDownload slide Comparison of MPFF time series of normal flow and attack flow. By comparing Figs. 2 and 7, we can conclude that under normal conditions, the MPFF value of network flow is small, the fluctuation is small, and the trend is smooth, while during an attack period, the MPFF value of network flow is large, the value fluctuates greatly, the peak value is close to 70, and the MPFF value is unstable. The interval between normal and abnormal flows is calculated, and the threshold U is calculated as 10. The forecast sample length lies between 1 and 100. The ARFFA (1,1,1) model is used to forecast and correct the MPFF value after 150 sample points. The average error is shown in Fig. 8, where it can be seen that both curves have an upward parabolic shape. The experimental results show that the forecasting effect of the proposed algorithm increases with the forecast sample length within 50 seconds of the forecast sample length, and achieves the best effect in the vicinity of 50 seconds. When the forecast sample length exceeds 50 seconds, the detection effect gradually decreases. Regarding the upward parabolic shape in the experimental results, inaccurate forecasting results and large average errors occur when the forecast sample length is too short, and the number of samples used for training is insufficient, and when the forecasting sample is too long, the cumulative error phenomenon of the ARIMA model results in larger forecast error and average error for the data. Figure 8. View largeDownload slide Forecast error values for different forecasting sample lengths. Figure 8. View largeDownload slide Forecast error values for different forecasting sample lengths. The forecast time sample length is chosen as 50. The forecast and actual values are fitted as shown in Fig. 9. We see from the figure that a long-term forecast can be achieved by combining static and dynamic models, i.e. the next 50 forecasts can be obtained. However, error accumulation causes the error of the latter part of the forecast to continually increase. The forecast is still far from the expectation. The mean value of the forecasting error is 0.2798. Figure 9. View largeDownload slide Fitting of forecast and actual values. Figure 9. View largeDownload slide Fitting of forecast and actual values. Using the selection method in Section 4.3 of dividing the threshold, the threshold M of cumulative error W is set at 27.4 to calculate the cumulative error of the first l forecasts. It is concluded that when l=30, the cumulative error reaches M. The Fourier series is used to correct the error of the next 20 data points. The fitting pattern of the forecast and actual values after Fourier correction are shown in Fig. 10. The mean value of the calculated forecasting error is 0.1496, which is 0.1320 less than the value before error correction. Figure 10. View largeDownload slide Fitting of forecast value after error correction and actual values. Figure 10. View largeDownload slide Fitting of forecast value after error correction and actual values. According to the analysis of Section 5.2.1, the fluctuation range of the MPFF value is within the threshold M (0–10). After setting the threshold of abnormal alarms, the time sample of attack in this dataset can be clearly seen. Based on the detection results of DDoSs obtained from the actual MPFF value (Fig. 11), the DDoS occurs at the 31st forecast sample point, and the MPFF value is 10.2551. However, based on the forecast MPFF value, the DDoS occurs at the 34th forecast sample point, where the MPFF value is 10.2217. Therefore, this result still shows certain problems such as inaccuracy and hysteresis. The detection, false-alarm and total error (ER) rates of this detection are ~0.8421, 0 and 0.04, respectively. After the forecasting error is corrected using the Fourier series, the forecast value shows that the attack occurs at the 31st sample point, and the MPFF value is 10.2422. The detection rate DR of the test result is 1, the false-positive rate FR is 0 and the total error rate ER is 0. The detection result is greatly improved compared with that before error correction. Figure 11. View largeDownload slide Detection results of DDoS before error correction. Figure 11. View largeDownload slide Detection results of DDoS before error correction. Because of the low rate DDoS attack, link flooding attack and application level attacks, intelligent DDoS attack is developed based on the traditional DDoS attack [37], when the flow characteristics are similar, so this method is also applicable to low rate DDoS attack, link flooding attack and application level attack intelligent DDoS attack. 6.3. Comparison with related work The results of this paper are compared with those of the c-SVR regression forecasting algorithm and the SMPM algorithm to verify that the Fourier-ARIMA detection model based on MPFF time series has better sequence modeling and detection ability for DDoSs. The LibSVM tool developed by Prof. Lin Chih-Jen at Taiwan University [38] and the SMPM algorithm [37] were used in the MATLAB 2012a environment in this experiment. The same 150 samples were used as training samples as described above. The DDoS was detected on samples with length of 50 s using the algorithm in this work, the ARIMA model forecasting method, the c-SVR regression algorithm, and the SMPM algorithm. The test results were classified by assigning a value of −1 to normal network traffic and 1 to possible DDoSs. The detection results of four algorithms are shown in Fig. 12. The figure shows that when the detection length is 50 s the detection rate of the DDoS detection method can reach 100%, and the false-alarm rate and total error rate are zero, indicating that this method can achieve high accuracy and a low false-alarm rate. The results show three inaccurately forecast sample points, indicating that Fourier error correction can improve the forecasting results of the ARIMA model and reduce the error in the forecast. The detection rate of the c-SVR algorithm is low, and seven attack sample points are incorrectly identified as normal samples. The detection rate of the SMPM algorithm can also reach 100%, but the false-alarm rate is high, and a normal sample can be identified as an attack sample, resulting in false alarms in practical application. Figure 12. View largeDownload slide Comparison of detection results of DDoS for different algorithms. Figure 12. View largeDownload slide Comparison of detection results of DDoS for different algorithms. The same 150 samples were used as a training sample. Four sets of testing data with lengths of 25, 50, 75, and 100 time sample points were detected using the algorithm in this work, the ARIMA model, c-SVR regression forecasting, and the SMPM algorithm. The test results are shown in Table 3. Comparison between the four sets of test results shows that with increased sample quantity, the detection method in this work gradually shows advantages over other methods. Although the detection rates of four kinds of detection methods are affected to different degrees when increasing the length of the test sample, the detection rate of the ARIMA and SMPM algorithms obviously decreases, indicating that the detection method in this work can detect DDoSs earlier. The detection rate of this algorithm is 100% when the length of the test samples is 50 and 75. When the test length is 25, the detection rate can still reach 93.33%, showing good results and robustness at different test lengths. Table 3. Performance comparison of different test methods. T (s) 25 (%) 50 (%) 75 (%) 100 (%) Fourier-ARIMA DR 93.33 100 100 89.04 FR 0 0 7.41 18.52 ER 4 0 2.67 13 ARIMA DR 60 82.4 79.17 73.6 FR 0 0 7.4 21.4 ER 24 6 16 25 c-SVR DR 66.67 60 79.17 87.67 FR 0 10 11.11 7.41 ER 20 22 17.33 11 SMPM DR 85.71 100 77.08 82.26 FR 14.29 36.67 3.85 0 ER 12 22 16 51 T (s) 25 (%) 50 (%) 75 (%) 100 (%) Fourier-ARIMA DR 93.33 100 100 89.04 FR 0 0 7.41 18.52 ER 4 0 2.67 13 ARIMA DR 60 82.4 79.17 73.6 FR 0 0 7.4 21.4 ER 24 6 16 25 c-SVR DR 66.67 60 79.17 87.67 FR 0 10 11.11 7.41 ER 20 22 17.33 11 SMPM DR 85.71 100 77.08 82.26 FR 14.29 36.67 3.85 0 ER 12 22 16 51 Table 3. Performance comparison of different test methods. T (s) 25 (%) 50 (%) 75 (%) 100 (%) Fourier-ARIMA DR 93.33 100 100 89.04 FR 0 0 7.41 18.52 ER 4 0 2.67 13 ARIMA DR 60 82.4 79.17 73.6 FR 0 0 7.4 21.4 ER 24 6 16 25 c-SVR DR 66.67 60 79.17 87.67 FR 0 10 11.11 7.41 ER 20 22 17.33 11 SMPM DR 85.71 100 77.08 82.26 FR 14.29 36.67 3.85 0 ER 12 22 16 51 T (s) 25 (%) 50 (%) 75 (%) 100 (%) Fourier-ARIMA DR 93.33 100 100 89.04 FR 0 0 7.41 18.52 ER 4 0 2.67 13 ARIMA DR 60 82.4 79.17 73.6 FR 0 0 7.4 21.4 ER 24 6 16 25 c-SVR DR 66.67 60 79.17 87.67 FR 0 10 11.11 7.41 ER 20 22 17.33 11 SMPM DR 85.71 100 77.08 82.26 FR 14.29 36.67 3.85 0 ER 12 22 16 51 Figure 13 shows the average detection rate, mean false-positive rate and average error rate of the four methods. The average detection rate is 95.59%, which is respectively 22.21% and 9.33% higher than that of the c-SVR regression forecasting algorithm and SMPM algorithm. The results show that the method can more accurately distinguish between normal and abnormal flows. It is also sensitive to the change of the MPFF value caused by the attack flow. In the four experiments, the average false-positive rate of this work is 6.48%, and the average error rate is 4.92%, which is lower than that of the other three detection methods. It can be seen that this method can identify DDoSs more accurately. Figure 13. View largeDownload slide Comparison of average detection rate, average false-alarm rate and average error rate for four detection algorithms. Figure 13. View largeDownload slide Comparison of average detection rate, average false-alarm rate and average error rate for four detection algorithms. 7. CONCLUSION In view of the current SAN user behavior, existing DDoS detection methods suffer problems of single detection attack type, low false-negative rate and high false-positive rate. In this paper, we have proposed a DDoS detection method based on MPFF time-series forecasting. Specifically, we describe the characteristics of DDoS based on TCP, UDP and ICMP protocols. Then, we present the details of the MPFF feature. After that, we build an ARIMA model, and it is subsequently used to accurately forecast and detect DDoS for SAN, where Fourier series are used to correct forecasting errors. The experimental results show that this method can make full use of the information of packets, detect multiple types of attacks, and accurately distinguish normal flow and attack flow. This method provides a feasible solution for situation estimation, so this method can provide a powerful basis for DDoS attack recognition in SAN network. Compared with other detection methods, the proposed method can detect DDoS earlier, and it shows better performance in terms of detection rate, false-positive rate and time delay. In future, we will study the effect of MPFF time series and the impacts of model parameters to further improve the overall performance of the proposed detection. FUNDING The experimental environment including hardware devices and software products of this work was supported by the National Natural Science Foundation of China [61762033, 61363071 and 61702539]. The experimental operation and maintenance expenses were supported by the National Natural Science Foundation of Hainan Province [617048 and 2018CXTD333]. REFERENCES 1 Rathore , S. , Sharma , P.K. , Loia , V. et al. ( 2017 ) Social network security: issues, challenges, threats, and solutions . Inf. Sci. , 421 , 43 – 69 . Google Scholar CrossRef Search ADS 2 Sun , W. , Cai , Z. , Li , Y. , Liu , F. , Fang , S. and Wang , G. ( 2018 ) Security and privacy in the medical Internet of things . Secur. Commun. Netw. , 2018 , 5978636 . 3 Ning , Z. , Hu , X. , Chen , Z. , Zhou , M. , Hu , B. , Cheng , J. and Obaidat , M.S. ( 2017 ) A cooperative quality-aware service access system for social Internet of vehicles . IEEE IoT J. , PP , 1 – 1 . 4 Ning , Z. , Xia , F. , Ullah , N. , Kong , X. and Hu , X. ( 2017 ) Vehicular social networks: enabling smart mobility . IEEE Commun. Mag. , 55 , 49 – 55 . Google Scholar CrossRef Search ADS 5 Zhang , J. , Hu , X. , Ning , Z. , Ngai , E. , Zhou , L. , Wei , J. , Cheng , J. and Hu , B. ( 2017 ) Energy-latency trade-off for energy-aware offloading in mobile edge computing networks . IEEE IoT J. , PP , 1 – 1 . 6 Cai , Z. , Wang , Z. , Zheng , K. and Cao , J. ( 2013 ) A distributed TCAM coprocessor architecture for integrated longest prefix matching, policy filtering, and content filtering . IEEE Trans. Comput. , 62 , 417 – 427 . Google Scholar CrossRef Search ADS 7 Elejla , O.E. , Anbar , M. and Belaton , B. ( 2016 ) ICMPv6-based DoS and DDoSs and defense mechanisms: review . IETE Tech. Rev. , 34 , 390 – 407 . Google Scholar CrossRef Search ADS 8 Zargar , S.T. , Joshi , J. and Tipper , D. ( 2013 ) A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks . IEEE Commun. Surv. Tutorials , 15 , 2046 – 2069 . Google Scholar CrossRef Search ADS 9 Zhou , C.V. , Leckie , C. and Karunasekera , S. ( 2010 ) A survey of coordinated attacks and collaborative intrusion detection . Comput. Secur. , 29 , 124 – 140 . Google Scholar CrossRef Search ADS 10 Pensa , R.G. and Blasi , G.D. ( 2017 ) A privacy self-assessment framework for online social networks . Expert Syst. Appl. , 86 , 18 – 31 . Google Scholar CrossRef Search ADS 11 Rong , H. , Ma , T. , Tang , M. et al. ( 2017 ) A novel subgraph K+-isomorphism method in social network based on graph similarity detection . Soft Comput. , 21 , 1 – 19 . Google Scholar CrossRef Search ADS 12 Ma , T. , Wang , Y. , Tang , M. et al. ( 2016 ) LED: a fast overlapping communities detection algorithm based on structural clustering . Neurocomputing , 207 , 488 – 500 . Google Scholar CrossRef Search ADS 13 Ferrag , M.A. and Ahmim , A. ( 2017 ) ESSPR: an efficient secure routing scheme based on searchable encryption with vehicle proxy re-encryption for vehicular peer-to-peer social network . Telecommun. Syst. , 66 , 481 – 503 . Google Scholar CrossRef Search ADS 14 Shen , J. , Shen , J. , Chen , X. et al. ( 2016 ) An efficient public auditing protocol with novel dynamic structure for cloud data . IEEE Trans. Inf. Forensics Secur. , 12 , 2402 – 2415 . Google Scholar CrossRef Search ADS 15 Shen , J. , Chang , S. , Shen , J. et al. ( 2016 ) A lightweight multi-layer authentication protocol for wireless body area networks . Future Generation Comput. Syst. , 78 , 956 – 963 . Google Scholar CrossRef Search ADS 16 Yu , Y. , Long , J. and Cai , Z. ( 2017 ) Network intrusion detection through stacking dilated convolutional autoencoders . Secur. Commun. Netw. , 2017 , 1 – 11 . Google Scholar CrossRef Search ADS 17 Gu , B. and Sheng , V.S. ( 2017 ) A robust regularization path algorithm for ν-support vector classification . IEEE Trans. Neural Netw. Learn. Syst. , 28 , 1241 – 1248 . Google Scholar CrossRef Search ADS PubMed 18 Gu , B. , Sun , X. and Sheng , V.S. ( 2017 ) Structural minimax probability machine . IEEE Trans. Neural Netw. Learn. Syst. , 28 , 1646 – 1656 . Google Scholar CrossRef Search ADS PubMed 19 Gu , B. , Sheng , V.S. , Tay , K.Y. et al. ( 2017 ) Incremental support vector learning for ordinal regression . IEEE Trans. Neural Netw. Learn. Syst. , 26 , 1403 – 1416 . Google Scholar CrossRef Search ADS 20 Wang , C. , Miu , T. , Luo , X. and Wang , J. ( 2018 ) Skyshield: a sketch-based defense system against application layer ddos attacks . IEEE Trans. Inf. Forensics Secur. , 13 , 559 – 573 . Google Scholar CrossRef Search ADS 21 Zhang , C. , Cai , Z. , Chen , W. , Luo , X. and Yin , J. ( 2012 ) Flow level detection and filtering of low-rate ddos . Comput. Netw. , 56 , 3417 – 3431 . Google Scholar CrossRef Search ADS 22 David , J. and Thomas , C. ( 2015 ) DDoS detection using fast entropy approach on flow-based network traffic . Procedia Comput. Sci. , 50 , 30 – 36 . Google Scholar CrossRef Search ADS 23 Zheng , K.F. and Wang , X.J. ( 2011 ) Detecting DDoS with hurst parameter of marginal spectrum . J. Beijing Univ. Posts Telecomm. , 34 , 128 – 132 . 24 Sang , M.L. , Dong , S.K. , Lee , J.H. et al. ( 2012 ) Detection of DDoSs using optimized traffic matrix . Comput. Math. Appl. , 63 , 501 – 510 . Google Scholar CrossRef Search ADS 25 Karnwal , T. , Sivakumar , T. and Aghila , G. ( 2012 ) A Comber Approach to Protect Cloud Computing Against XML DDoS and HTTP DDoS, Proc. Electrical, Electronics and Computer Science (SCEECS’12), 1–5. IEEE, New York. 26 Tama , B.A. and Rhee , K.H. ( 2015 ) Data mining techniques in DoS/DDoS detection: a literature review . Spec. Sect. Inf. Commun. Syst. Secur. , 18 , 3739 – 3747 . 27 Latif , R. , Abbas , H. , Latif , S. et al. ( 2015 ) EVFDT: an enhanced very fast decision tree algorithm for detecting distributed denial of service attack in cloud-assisted wireless body area network . Mobile Inf. Syst. , 2015 , 1 – 13 . Google Scholar CrossRef Search ADS 28 Fu , T.C. ( 2011 ) A review on time series data mining . Eng. Appl. Artif. Intell. , 24 , 164 – 181 . Google Scholar CrossRef Search ADS 29 Bagnall , A. , Lines , J. , Bostrom , A. et al. ( 2017 ) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances . Data Min. Knowl. Discov. , 31 , 606 – 660 . Google Scholar CrossRef Search ADS 30 Nezhad , S.M.T. , Nazari , M. and Gharavol , E.A. ( 2016 ) A novel DoS and DDoSs detection algorithm using ARIMA time series model and chaotic system in computer networks . IEEE Commun. Lett. , 20 , 700 – 703 . Google Scholar CrossRef Search ADS 31 Andrysiak , T. , Łukasz , S. , Maszewski , M. et al. ( 2015 ) A DDoSs detection based on conditional heteroscedastic time series models . Image Process. Commun. , 20 , 23 – 33 . 32 MIT Lincoln Laboratory ( 2000 ) DARPA Intrusion Detection Scenario-Specific Data Sets, http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideval/data/2000data.html (11 March 2000). 33 The Cooperative Association for Internet Data Analysis , The Caida Ucsd ‘DDoS 2007 ’ Dataset, http://www.caida.org/data/passive/ddos-20070804_dataset.xml (5 August 2007). 34 Xie , Y. and Yu , S.Z. ( 2009 ) Monitoring the application-layer DDoSs for popular websites . IEEE/ACM Trans. Netw. , 17 , 15 – 25 . Google Scholar CrossRef Search ADS 35 Rosli , A. , Taib , A.M. and Wan , N.A.W.A. ( 2017 ) Utilizing the enhanced risk assessment equation to determine the apparent risk due to user datagram protocol (UDP) flooding attack . Int. J. Mobile Comput. Multimed. Commun. , 9 , 1 – 4 . 36 Kumar , M.A.V. and Udayakumar , R. ( 2015 ) Identifying and blocking high and low rate DDOS ICMP flooding . Indian J. Sci. Technol. , 8 , 1 – 5 . Google Scholar CrossRef Search ADS 37 Gu , B. , Sun , X. and Sheng , V.S. ( 2017 ) Structural minimax probability machine . IEEE Trans. Neural Netw. Learn. Syst. , 28 , 1646 – 1656 . Google Scholar CrossRef Search ADS PubMed 38 Lin , C.-J. , LibSVM 3.22-A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ (22 October 2016 ). Author notes Handling editor: Zhaolong Ning © The British Computer Society 2018. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) TI - A DDoS Detection Method for Socially Aware Networking Based on Forecasting Fusion Feature Sequence JF - The Computer Journal DO - 10.1093/comjnl/bxy025 DA - 2018-03-24 UR - https://www.deepdyve.com/lp/oxford-university-press/a-ddos-detection-method-for-socially-aware-networking-based-on-szzaBMf4Fb SP - 1 EP - 970 VL - Advance Article IS - 7 DP - DeepDyve ER -