TY - JOUR AU1 - Chaurasia, Nisha AU2 - Tapaswi, Shashikala AU3 - Dhar, Joydip AB - Abstract Effective server consolidation in cloud has significantly become one of the major challenges. The determination of over-utilized and under-utilized servers and migration needs to work cordially with energy conservation and optimal resource usage. Clustering of servers makes easy retrieval of servers for best possible allocation of tasks. The paper focuses on the clustering of servers in effective manner using Expectation Maximization (EM) concept. It presents an algorithm using EM as a phase for the server consolidation in cloud. Employing EM for clustering makes more uniform clustering of servers leading to improved allocation of resource requests. The results of the proposed scheme have been compared with existing K-means, Fuzzy C-Means and Spectral clustering. The proposed EM algorithm performance is better in terms of handling of probabilistic constraints, guaranteeing convergence with well-separated components. 1. INTRODUCTION In a Cloud Computing [18, 22, 26] scenario, a host (physical server) has multiple Virtual Machines (VM) [28, 29, 31] installed that perform multiple computations in parallel. To minimize the energy consumption, it is attempted to keep the number of running physical servers active as minimum as possible which are capable to process the incoming requests. This introduces the concept of Server Consolidation [19, 25, 27, 34, 38]. Server Consolidation is the allocation of maximum number of VMs in minimum number of hosts. Consolidation allows redundant servers to be put into a low-power state or switched-off or devoted to the execution of incremental workload [14]. Basically, consolidation can be done both in server and VM. When the consolidation is done by reducing the number of physical servers by maintaining the same number of VMs installed as a whole is called server consolidation, while when the consolidation is done for execution of multiple VMs on the same host to reduce power consumption is referred as VM consolidation. For server consolidation, it is recommended to perform resource isolation and VM migration. Hence, for consolidation, the resources are managed such that no resource gets compromised during the migration process. The relation among server consolidation, VM migration and resource isolation are inter-related. To achieve any one of these, the other two implicitly comes into account as they contribute as the affecting factors. For instance, if server consolidation is to be achieved, it would require effective handling of VM migration for reducing the number of physical servers along with the concern to isolate resources so that no resource conflict occurs during migration. The relation among the three can be visualized in Fig. 1. Figure 1. View largeDownload slide Relationship among consolidation, migration and resources. Figure 1. View largeDownload slide Relationship among consolidation, migration and resources. There are many issues that affect consolidation, including server and workload behavior, security restrictions requiring co-location of certain application components and requiring power line redundancy restrictions [2]. Server consolidation is considered as NP-complete problem where the focus is on conserving energy by turning on as few servers as possible by consolidating the workload. The challenge faced for server consolidation is the difficulty to map VMs with different specifications to hosts such that resources allocated by VMs to same server does not exceed the amount of resource of that server along with minimum number of hosts in total. VM consolidation raises several management issues because it tends to an optimal exploitation of available resources, while avoiding severe performance degradation due to resource consumption of co-located VMs [23]. The paper aims to achieve efficient consolidation by performing clustering of servers for improved resource allocation using Expectation Maximization (EM) approach [36, 40]. Owing to the clustering, many cheap computers can be used to replace the expensive servers leading to reduction in energy consumption by deploying large servers [4, 22]. EM algorithm is an effective and preferable unsupervised clustering method which estimates the parameters in statistical models by iteratively determining the maximum likelihood. It helps in fitting mixture-of-the-gaussian models. A Gaussian mixture model is defined as an unsupervised probabilistic learning model which assumes that all the data points in data set are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Also, EM is considered as one of the most practical methods of learning latent variable models. Latent variable models are statistical models where latent (hidden) variables are obtained in addition to the observed variables. EM attempts to uncover the parameters of the probability distribution which is having the maximum likelihood of its attributes on the basis of which clustering is done. The EM algorithm uses maximum likelihood attempts to perform precise clustering with less or no outliers. Formally, later replaces the maximization of the observed-data log likelihood with the maximization of the expected complete-data log likelihood conditioned by the observations [11]. The EM algorithm works in two steps: (i) expectation: formulates a function for the expectation of log likelihood on the basis of current estimated of the parameters (ii) maximization: maximizes the expected log likelihood evaluated for computing parameters. The succeeding sections are: Section 2 is the related work, Section 3 shows the different profiles maintained for consolidation, Section 4 describes about the proposed model with categorization of servers using EM, Section 5 represents experimentation and results, and Section 6 concludes the presented work. 2. RELATED WORK Since cloud computing is becoming increasingly popular, the ever-increasing requests from users demand more data centers (collection of computers that stores data within a local network) is built. This raises the issue of reducing these data centers to optimum leading to consolidation [6]. The section explores about the work performed so far to achieve an ideal level of server consolidation. Many researches have been contributed confirming the consolidation attained to a remarkable value. The various profound consolidation schemes analyzed or proposed by the researchers may be categorized into three types: Nature-inspired: The authors in the nature-inspired category have performed a deep study on the behavior of various nature instances such as Swarm V-formation and Honeybee Hive formation. The authors have tried to learn the manner, these nature beings co-ordinates among each other such that the energy is conserved. Assuming the nature beings and their behavior to the Cloud scenario, the beings are considered as the servers or data centers and their behavior is imitated as the behavior which these Cloud servers would adopt to conserve energy. Hence, according to the way the nature beings coordinate their work, the data centers in Clouds are categorized into active servers, fully loaded active servers, idle servers and power down servers. In the literature, it was found that managing the servers into these classes allow easier assessment of how the energy can be conserved. Following are some significant contributions in this category: Pop et al. [10] studied the birds’ V-formation during the migration process in which the birds periodically change their flying location in V-formation in order to conserve energy. They proposed a Swarm-inspired Consolidation algorithm for minimizing power consumption. Mastroianni et al. [14] present ecoCloud a self organizing and adaptive approach for the consolidation of VMs. The approach is expected to be inspired by the Ant algorithm. The VMs mapping is based on Bernoulli trials through which single server decides (on the basis of location information) whether or not they are available to execute an application. The authors evaluated the ecoCloud performance by proposing a fluid mathematical model. Singh and Hemalatha [17] aim to support server consolidation by keeping in view the maximization of resource utilization and the reduction of energy. They performed consolidation by implementing HoneyBee algorithm with hierarchical clustering. The allocation of VM is done by a policy named VmAllocationPolicy which is responsible for proper allocation of VM to host. They proposed algorithm as HoneyBee Cluster Technique to cluster the resources for ease of search for available resources. Parameter-based: The authors have focused on certain evaluation parameter viz. memory, CPU requirement. A parameter is pre-decided and based on this parameter, they proposed consolidation algorithms. The purpose of this parameter is to determine the most appropriate server to which an incoming workload can be allotted. These parameters are formally based on NP-hard bin-packing problem such best-fit, first-fit and heaviest-fit server finding. Following are few contributions in this category: Khanna et al. [1] focused on the performance of servers and tried to achieve maximum performance acceptable by reducing the cost of migration. They proposed Dynamic Management algorithm which finds the physical machine (PM) from which VM is to be migrated, then determines which VM on that PM is to be considered and finally finds new PM where the respective VM will be migrated. They also explored about the considerations required for performing server consolidation. Gong and Gu [5] presented a pattern-driven application consolidation (PAC) system using Fast Fourier Transform for extraction of repeating signature patterns (called signatures of different VMs) from raw time series measurements. PAC also achieves robust matching of signatures using the dynamic time wrapping algorithm. The VMs are assigned host according to largest resource demand by PAC. The PAC finds the host for a VM by finding a host whose residual resource signatures best match with the VM resource demand signature. Ho et al. [6] proposed a server consolidation algorithm guaranteeing server consolidation along with bounded relocation costs using a modified bin-packing problem for relocation. They proposed an algorithm named, Heaviest First Relocation, incorporating the fundamental of First Fit seeks for the heaviest bin in each phase of relocation method presenting good solutions both in the worst and the average case. Lee and Sahu [8] assured the consolidation by using cluster-based approach, leaving behind the only use of traditional bin-packing algorithms. They proposed a cluster-based server consolidation algorithm which pre-estimated the amount of CPU time, memory required while running an application and amount of network traffic between two applications. The algorithm targets to minimize the resource usage to its maximum with a given number of machines. Nevithitha and Sriram [16] consider batch and transactional workload in cloud for consolidation. The concept used is the Dependency Structure Prioritization for the assignment of priority to a job. The dependency among jobs is analyzed for scheduling in form of a structure (they considered structure as a tree) which can be both open and closed structure. The priority to each job is assigned on the basis of maximum weight in the dependency matrix constructed. Corradi et al. [23] present a Cloud management platform which optimizes VM consolidation about three main dimensions: power consumption, host resources and networking. They propose this management solution for the OpenStack which is a resolution for creating and managing Cloud Infrastructure. The authors prefer OpenStack as it offers live migration. The Network File System is utilized for network storage with Kernel-based Virtual Machine hypervisor only. Rao and Thilagam [33] contributed for consolidation by heuristically reducing fragmentation of residual resources. They proposed RFAware Server Consolidation heuristic-based approach performing defragmentation along with reduction of active servers in cloud. They aim to explore heuristics leading to reduction in residual resource fragmentation making residual resource useful and reducing cost and energy consumption. The authors formulated the multi-objective problem aiming improvement in resource utilization and reduction in energy consumption, Service Level Agreement (SLA) violations, number of live migrations and residual resource fragmentation. Cost-based or Performance-based: The authors consider a particular aspect while performing the computation in Cloud in order to minimize the cost of migration, CPU or disk usage or input/output events. They contributed to the consolidation with limiting these aspects, resulting in high performance leading to increased number of requests completed without violating the SLA. Again, following are some significant contributions in this category: Abdulgafer et al. [3] performed redesigning of existing network by applying server consolidation. The authors experimented different scenarios in order to estimate how server consolidation can result performance improvement of under-utilized (UU) servers. The objective for the experiment was to maximize network performance leading to reduced number of UU servers. They examined the changes in performance, CPU usage and idle time of each server before and after the consolidation by varying the number of servers for each scenario. Ye et al. [7] experimented multiple VMs live migration for different resource reservation methods. They applied resource reservation strategies in source and target machine both and proposed a framework for VM Live Migration which comprises of modules aiming to migrate the VM workloads from source machine to target machine concurrently depending on migration requirement, vacating the source machine as a result. Maezolla et al. [9] proposed a novel approach for server consolidation which is through gossiping. They proposed an algorithm which is supposed to be fully decentralized, named V−MAN. V−MAN attempts of consolidate data centers of cloud using the gossiping prototype. V−MAN searches for VM instances for new arrangements leading to maximization in number of empty servers. The V−MAN leads one server to interact with server having higher number of V−MAN and perform VM migration until the receiving server approaches to its maximum capacity. Huang et al. [12] proposed a framework which is able to consolidate VMs considering MapReduce enabled computing clouds. The consolidation has been carried out for both MapReduce and non-MapReduce VM instances. They considered VM consolidation problem as an integer nonlinear optimization problem for MapReduce enabled computing clouds on the basis of SLA proposed models. The I/O bandwidth from physical server is distributed uniformly among all of the MapReduce instances while left over bandwidth is divided between non-MapReduce instances in a statistically multiplexing manner. Liu et al. [13] proposed a priority-based method for consolidating parallel workloads in cloud. The computing capacity of each node of virtualization technology is partitioned into two tiers: foreground VM tier (having high CPU priority) and background VM tier (having low CPU priority). The proposal is based on the batch scheduling algorithm viz. First Come First Serve (FCFS) for parallel jobs. The goal of the proposed method for scheduling is to enhance the utilization of servers which are allocated to parallel discrete event jobs and to preserve the FCFS order of jobs whenever the available resources fulfills the needs of these jobs. Beloglazov and Buyya [15] proposed a novel approach based on Markov chain model to determine the overloaded host optimally by maximizing the mean integration time under the specified QoS goal. They proposed Optimal Markov Host Overload Detection (MHOD) algorithm whose efficiency is evaluated using Optimal (OPT) offline algorithm. The proposed MHOD–OPT algorithm dynamically selects the best window size for eliminating the bias estimate error. In addition, it benefits from the small sampling error and small identification error of large window sizes and of small window sizes, respectively. Xia et al. [37] proposed an analytic model-based approach for quality evaluation of IaaS cloud by considering expected requirement completion time, rejection probability and system overhead rate as key quality metrics. It identifies the optimal tradeoff between performance and system performance. The authors assure that their assumptions and model can easily be adjusted to deal with present scenario by assigning Cloud Management Unit queue capacity and its processing rate to infinite (practically large number). Shen et al. [32] propose a stochastic model using queuing theory for energy-efficient consolidation by powering off of idle servers dynamically. They developed an algorithm for optimal estimation of efficient data center known as Broyden–Fletcher–Goldfarb–Shanno. They also implemented Stochastic Right-sizing Model for modeling the problem. The authors used Quasi birth–death process (M/M/N) and matrix analytic method for the study of the model. Although there are many significant contributions but still the need for improvement is encountered where higher consolidation is urged. The subsequent section suggests profiling of consolidation process for expanding the consolidation in detail. 3. CONSOLIDATION PROFILING: THE PROBLEM The consolidation work may be carried out on the basis of three categories: Resource profiling, Workload profiling and Migration profiling. Resource profiling: The resources viz. CPU, memory, bandwidth are requested by cloud users whenever they send a VM creation request. It becomes cumbersome to allocate the resources available for the new request as there may not be sufficient resources to fulfill the requirement. Hence, resource profile is maintained where each type of resource is kept under different server cluster of its type. The clustering of resources eases the request to be directed to the appropriate resource requested. This reduces the effort of searching of the resources from the resource pool. Workload profiling: The workload of each task is made to fall under a relevant category where it becomes easy to maintain the working status of each active server. The workload profile manages the active servers into over-utilized (OU), (UU), normal (N) and idle (I) servers. The servers above predefined maximum tolerable workload are said to be OU while the servers below minimum tolerable workload are termed as UU, those within the range of two are normal while with no workload are idle servers. Also, if a request fails to find any server meeting to its requirement then a new server is brought to active state from idle (sleep mode) servers and is assigned to UU server category. This profiling can be viewed in Fig. 2. Migration profiling: The server consolidation introduces the concept of VM migration in cloud computing. The VM migration process involves selection of UU and/or OU servers and thereafter sharing their workloads to servers capable of holding. However, it is kept in view that the servers sharing the workload themselves do not get OU (i.e. they work normally). Thus, the servers from which workload is migrated is left either idle or normally utilized, respectively. The idle servers are then switched to sleep or dead state preserving the unnecessary energy consumption. Figure 2. View largeDownload slide Profiling strategy of workload. Figure 2. View largeDownload slide Profiling strategy of workload. The contribution aims at attaining the profiling of servers for performing migration. This seeks determination of OU and/or UU servers. For this purpose, the servers are categorized according to their working capacities applying clustering approaches such as K-means, Fuzzy C-Means (FCM), Spectral and proposed EM. The brief description about these clustering approaches are as follows. K-means [4, 39] is supposed as a simplest unsupervised learning algorithm. The main feature of K-means is its simplicity and ease of implementation. The K in K-means defines the mean/center or number of clusters/partitions desired on the data set. It uses the concept of nearest mean (generally implementing Euclidean distance) among which clusters (of data points) are formed. However, it lacks to provide fine clustering for highly overlapping data. FCM [36] is an extension of K-means clustering where every data point in the data set belongs to every cluster to a certain degree. It is supposed that a certain data value that lies close to the center of a cluster of data objects will have a high membership to that cluster and another request that lies far away from the data value of a cluster will have low membership in that cluster. Spectral Clustering [22] is a dot pair clustering algorithm which calculates similarity matrix using eigenvalues or spectrum. It performs nonlinear dimensionality reduction forming two parts of data based on eigenvalue. The algorithm divides data point in two parts using second smallest eigenvalue of the symmetric normalized data. It is clustering with promising applications prospects for data (or object) clustering. While the proposed EM algorithm [20, 35] for cloud is a first-order algorithm which focuses on the predictive aspect of data modeling. EM algorithm is used to estimate the maximum likelihood parameters of a statistical model locally. In studies, it is also stated that EM tries to converge to a local maximum for multimodal distributions. In respect to cloud, the categorization of servers is among the active servers with tasks hence it fits for obtaining the local optimum. It has the capability to slowly converge on Gaussian mixture problems for which mixture components are not supposed to be well separated. It has the ability of automatic satisfaction of probabilistic constraints and monotonic convergence. Also, it provides better results in cases where likelihood is more desirable. The proposed approach used for the profiling is discussed in this section. 4. THE PROPOSED APPROACH The section proposes a new server consolidation model for achieving higher performance by reducing the total number of active servers fulfilling the incoming requests. The model in contrast to the existing models in the literature distinguishes itself by redirecting the user request to the resource cluster pool which can manage user task and subsequently allocates the relevant resources from the computation server which can afford it. Thereafter, the system checks for the server sprawl or under-utilization periodically in order to maintain high performance. For this, it keeps the servers categorized into their working capacities of workload, naming them as UU, OU and N servers. The management of servers according to their state enables the easier migration and handling of idle servers. The corresponding workflow for the server consolidation with categorized servers is shown in Fig. 3. Figure 3. View largeDownload slide Workflow for categorizing servers according to their behavior. Figure 3. View largeDownload slide Workflow for categorizing servers according to their behavior. Whenever a request arrives from a cloud user, the request is calibrated by finding out the resource demand of the request. If the request is relevant for the resources, the request is forwarded to the cluster of servers having the resource required. After allocating the request (in form of a task) to resource cluster (of servers), the best-fit algorithm (or relevant) is applied in order to determine the server with best suitability to address the request. The best-fit server is normally found by arranging all the cluster servers in order of their residual capacity. Thereafter, the category of the server is estimated. Each active server here is categorized according to the current working status into OU, UU, N and idle. The periodic workload status is checked for every active server so as to update its category according to its current working threshold status. 4.1. Algorithm for server consolidation It is believed that for effective fulfillment of client requests, the allotment of tasks should be performed appropriately. Hence in order to address the requests, it is supposed to keep the active servers clustered into low-end (L), medium-end (M) and high-end (H) servers with respect to their resource capabilities. According to the demand, a request is forwarded to the relevant cluster of servers viz. L, M or H. After obtaining the cluster which suits the resource requirements, the request searches for its best allocation server in the list. As soon as the request finds the suitable server it gets executed on it. Further, it is attempted to keep all the active servers with request/workload in execution classified into under, over and normal utilized. The pseudo code for the server consolidation algorithm shown by Algorithm 1. Algorithm 1 Consolidation Algorithm. 1:  procedureRequest Allocation and Consolidation  2:  Arrival of request.  3:  Search for resource availability:  4:  if not found then  5:  Drop request  6:  else  7:  Forward request  8:  end if  9:  Select L, M or H cluster based on the demanded resources.  10:  Search for best suitable server:  11:  if not found then  12:  Turn on a new server and assign request to it  13:  else  14:  Assign request to the server  15:  end if  16:  After assignment, check the current working capacity of servers and categorize them into UU, OU and N servers.  17:  Perform migration for the determined UU and OU servers.  18:  end procedure  1:  procedureRequest Allocation and Consolidation  2:  Arrival of request.  3:  Search for resource availability:  4:  if not found then  5:  Drop request  6:  else  7:  Forward request  8:  end if  9:  Select L, M or H cluster based on the demanded resources.  10:  Search for best suitable server:  11:  if not found then  12:  Turn on a new server and assign request to it  13:  else  14:  Assign request to the server  15:  end if  16:  After assignment, check the current working capacity of servers and categorize them into UU, OU and N servers.  17:  Perform migration for the determined UU and OU servers.  18:  end procedure  This categorization of active servers avails the possibility to uncover the servers whose chances for migration are considerable. For this, EM is expected to provide beneficial results. 4.2. Categorization of active servers using EM The servers running in a cloud are made to perform VM migration in case of over-utilization or under-utilization. Hence, it is emphasized to keep track of these servers and maintain record of OU, UU and N capacities servers. For mapping the category of active servers, the paper suggests the use of EM concept. In the proposed approach, EM has been used for mapping the category of active servers. The EM algorithm using statistics iteratively finds the maximum likelihood of server for being in a particular cluster type (of servers). It determines parameters of probability distribution by obtaining maximum likelihood and then tries to maximize the obtained data log likelihood on the basis of the observations, revealing the cluster to which a server will belong. Let there is a data of servers S with UU servers as W1, OU servers as W2 and N running servers as W3. The categorization of servers is achieved as follows: Estimation of mean values by EM [30]: Assume server S, can be seen as a Gaussian matrix then,   p(s)=p(S/W1)P(W1)+p(S/W2)P(W2)+p(S/W3)P(W3), (1) where p(S), p(S/W1), p(S/W2) and p(S/W3) are probability density function (PDF) of changing workload in server S, UU VMs (W1), N (W2) and OU VMs (W3). Also P(W1), P(W2) and P(W3) are priori probability of UU, N and OU VMs, respectively. The PDF p(S/W1), p(S/W2) and p(S/W3) can be written for Gaussian:   p(S/Wk)=1(σk(2π))exp−(S−μk)22σk2, (2) where kϵ(1,2,3), μk and σk are mean and variance of the corresponding server of server class Wk, respectively. The estimation of mean values is done by formal initialization of mean μk, co-variance σk and priori probability P(Wk) which helps in obtaining initial UU, OU and N running servers, a threshold, t for the changing workload server is set from an empirical equation,   t=μS+R.σS, (3) where R is a constant, μ and σ are mean and standard deviation for servers set S, respectively. Thereafter, the values of μ, σ and P(Wk) are calculated from the classified capacities as initial values to EM. Expectation estimation: From (1) and (2) posterior probability, P(Wk/S) are calculated using,   P(Wk/si)=P(Wk)p(si/Wk)P(si), (4) here, i lies in (1, n) and si is the ith analyzed server from server set S having n number of available servers. Maximization estimation: For maximization, perform the re-estimation of the parameters using the equations:   P(I+1)(Wk)=∑i=0nPI(Wk/si)n, (5)  μkI+1=∑i=0nPI(Wk/si)si∑i=0nPI(Wk/si), (6)  (σk2)I+1=∑i=0nPI(Wk/si)(si−μk)2∑i=0nPI(Wk/si) (7) here, I is the iteration for maximization.Now check for convergence, if not then repeat expectation and maximization steps leading to final mean value. 5. EXPERIMENTATION AND RESULTS The experimentation for estimating the categorization of servers has been performed using the data set available from UCI Machine Learning Repository [21]. The data set includes machine cycle time in nsec. (MYCT), minimum memory in KB (MMIN), maximum memory in KB (MMAX), Cache in KB (CACH), minimum channel in units (CHMIN) and maximum channel in units (CHMAX). The analysis is performed considering Cloud scenario where it is assumed that these are the servers running in cloud with their maximum and minimum working capabilities defined. The platform used for the deployment is 32 bit quad core processors having 8 GB of RAM. Using the available information about the servers, EM leading to clustering of servers is achieved. Since it is difficult here to present all 209 entries, hence, indicative entries of the data set used for experimentation are shown in Table 1. Table 1. Partial data set of servers. Servers  MYCT (nsec)  MMIN (KB)  MMAX (KB)  CACH (KB)  CHMIN (units)  CHMAX (units)  1  0.72825  0.06012  0.92842  10  3.07692  7.27272  2  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  3  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  4  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  5  0.08091  2.48496  2.49249  1.25000  0.90909  1.53846  6  0.06068  2.48496  4.99499  2.50000  1.53846  1.81818  7  0.04045  4.98997  4.99499  2.50000  1.81818  3.07692  8  0.04045  4.98997  4.99499  2.50000  1.81818  3.07692  9  0.04045  4.98997  10  2.5000  1.81818  3.07692  10  0.04045  10  10  5  3.63636  6.15384  11  2.58260  0.29308  0.45920  0  0.11363  0.19230  12  2.58260  0.14028  0.53741  0.15625  0.19230  0.34090  13  0.28995  0.60621  1.24124  2.53906  0.19230  0.45454  14  0.22252  1.23246  2.49249  2.53906  0.19230  0.45454  15  2.24544  0  0  0  0.19230  0.22727  16  1.23398  0.14028  2.49249  0  0.76923  1.81818  17  1.01146  0.14403  0.30280  0.31250  0.76923  0.85227  18  0.84962  0.14028  0.77202  0  1.34615  1.81818  19  0.84962  0.29308  0.30280  0  0.90909  0.96153  20  0.62710  0.77202  1.54559  5.54687  1.53846  3.63636  Servers  MYCT (nsec)  MMIN (KB)  MMAX (KB)  CACH (KB)  CHMIN (units)  CHMAX (units)  1  0.72825  0.06012  0.92842  10  3.07692  7.27272  2  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  3  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  4  0.08091  2.48496  4.99499  1.25000  1.53846  1.81818  5  0.08091  2.48496  2.49249  1.25000  0.90909  1.53846  6  0.06068  2.48496  4.99499  2.50000  1.53846  1.81818  7  0.04045  4.98997  4.99499  2.50000  1.81818  3.07692  8  0.04045  4.98997  4.99499  2.50000  1.81818  3.07692  9  0.04045  4.98997  10  2.5000  1.81818  3.07692  10  0.04045  10  10  5  3.63636  6.15384  11  2.58260  0.29308  0.45920  0  0.11363  0.19230  12  2.58260  0.14028  0.53741  0.15625  0.19230  0.34090  13  0.28995  0.60621  1.24124  2.53906  0.19230  0.45454  14  0.22252  1.23246  2.49249  2.53906  0.19230  0.45454  15  2.24544  0  0  0  0.19230  0.22727  16  1.23398  0.14028  2.49249  0  0.76923  1.81818  17  1.01146  0.14403  0.30280  0.31250  0.76923  0.85227  18  0.84962  0.14028  0.77202  0  1.34615  1.81818  19  0.84962  0.29308  0.30280  0  0.90909  0.96153  20  0.62710  0.77202  1.54559  5.54687  1.53846  3.63636  View Large The data values have been normalized/fuzzified in order to map them under [0,1]. The data set has been normalized to keep uniformity in analysis of the clustering algorithms. Also, for comparing the clustering by EM, K-means and Spectral with FCM, it has to undergo fuzzification. This fuzzification helps in effective clustering of servers. Considering each column of the data set, the PDF as a part of estimation of each class is shown in Fig. 4. Figure 4. View largeDownload slide Data values with their respective density. Figure 4. View largeDownload slide Data values with their respective density. It is known that EM estimates probability of each point i.e. server belonging to a cluster and then in maximization step performs the estimation of parameter vectors from PDF for each class; Hence, PDF upon each server value needs to be determined. The insight of PDF estimated for each data value in the data set of servers is shown in Fig. 5. Figure 5. View largeDownload slide PDF for each value of data set. Figure 5. View largeDownload slide PDF for each value of data set. After obtaining the density values, clustering of servers is performed along with the estimation of mean and sigma values. The (minimum, maximum) values for mean and standard deviation determined are (0.0777, 3.7591) and (−1.3758, 5.9240), respectively. It is found that it took 21 iterations and log likelihood of −1210.53 for achieving convergence. It is found that each cluster of servers from the data set has distinct values indicating that there is no outlier or conflict of interest of data value to more than one cluster. Considering the cloud scenario having servers with significant distinguishability adds enormous profit providing ease for relevant task allocation. When the tasks are allocated to server, it is made sure that it lies in the elements (i.e. servers) of cluster for efficient use of server capability. The clusters formed using the density, mean and sigma can be visualized in Fig. 6. Figure 6. View largeDownload slide Clusters formed by EM. Figure 6. View largeDownload slide Clusters formed by EM. For comparison purpose, the EM clustering is compared with conventional form of clustering, FCM and spectral clustering. Conventional approach emphasizes on clustering by sorting of the servers based on their remaining capacities (or in some cases occupied capacities for tasks execution) and then dividing the obtained sorted list of servers into UU, N and OU using the pre-set thresholds. Ismaeel and Miri [39] present FCM clustering on cloud data center as the better approach than K-means, improving overall energy consumption. Also, Jin et al. [22] formed clusters of servers incorporating spectral clustering approach with efficient parallelism as focus in cloud scenario. The two approaches, FCM and spectral, use K-means eigenvalue(s) as the base of cluster formation while the utilized EM algorithm determines maximum likelihood of a data point (or server) for being in a cluster type which gives more uniform clustering with lesser outliers. Spectral clustering attempts to transform problem into a tractable eigenvector problem on the basis of standard relaxation procedure adds an extra computational cost to the approach. While limitation of FCM is also its high computational cost, this makes EM more preferable approach for clustering. On comparing the clusters of servers obtained by applying EM approach with K-mean, FCM approach, spectral and conventional way of clustering of servers, it is found that EM provides more uniform and efficient clustering than the FCM, as shown in Fig. 7. Figure 7. View largeDownload slide Cluster comparison between Conventional, K-mean, FCM, Spectral and EM. Figure 7. View largeDownload slide Cluster comparison between Conventional, K-mean, FCM, Spectral and EM. It is visible from Fig. 7 that conventional approach detects a higher number of UU servers which is not desirable as servers are made to run maximally at N state thus the distribution should be uniform. While in comparison with K-means, FCM and Spectral, EM gives more uniform cluster (of servers) set. This makes the cloud to restrict requests to servers which are near to OU and may focus on appending requests to the UU servers determined. 6. CONCLUSION The paper presents a distinguishing approach based on EM for differentiating and clustering the servers in cloud according to their capacities. Relating to the current capacities of active servers, the EM formulates UU, N and OU clusters of servers distinguishably. The experimentation shows that application of EM for clustering on the data set forms more uniformly spread clusters in comparison to other available clustering approaches. REFERENCES 1 Khanna, G., Beaty, K., Kar, G. and Kochut, A. ( 2006) Application Performance Management in Virtualized Server Environments. Proc. 10th Network Operations and Management Symposium (NOMS), Vancouver Convention and Exhibition Center Vancouver, Canada, April 3–7, pp. 373–381. IEEE/IFIP. 2 Srikantaiah, S., Kansal, A. and Zhao, F. ( 2009) Energy aware consolidation for cloud computing. ACM Cluster Comput. , 12, 1– 15. Google Scholar CrossRef Search ADS   3 Abdulgafer, A.R., Marimuthu, P.N. and Habib, S.J. ( 2009) Network Redesign through Servers Consolidations. Proc. 11th Int. Conf. Information Integration and Web-based Applications and Services (iiWAS2009), Kuala Lumpur, Malaysia, December 14–16, pp. 623–627. ACM. 4 Dalton, L., Ballarin, V. and Brun, M. ( 2009) Clustering algorithms: on learning, validation, performance, and applications to genomics. J. Curr. Genomics , 10, 430– 445. Google Scholar CrossRef Search ADS   5 Gong, Z. and Gu, X. ( 2010) PAC: Pattern-driven Application Consolidation for Efficient Cloud Computing. Proc. Annual Int. Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, August 17–19, pp. 24–33. IEEE/ACM. 6 Ho, Y., Liu, P. and Wu, J. ( 2011) Server Consolidation Algorithms with Bounded Migration Cost and Performance Guarantees in Cloud Computing. Proc. Fourth Int. Conf. Utility and Cloud Computing, Melbourne, Australia, December 5–7, pp. 154–161. IEEE. 7 Ye, K., Jaing, X., Huang, D., Chen, J. and Wang, B. ( 2011) Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments. Proc. Fourth Int. Conf. Cloud Computing, Washington DC, USA, July 4–9, pp. 267–274. IEEE. 8 Lee, S. and Sahu, S. ( 2011) Efficient Server Consolidation Intra-Cluster Traffic. Proc. Global Telecommunications Conference (GLOBECOM 2011), Houston, Texas, USA, December 5–9, pp. 1–6. IEEE. 9 Maezolla M., Babaoglu O., Panzieri F. ( 2011) Server Consolidation in Clouds through Gossiping. Proc. 12th Int. Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Lucca, Italy, June 20–24, pp. 1–6. IEEE. 10 Pop, C.B., Anghel, I., Cioara, T., Solemie, I. and Vartic, I. ( 2012) A Swarm-inspired Data Center Consolidation Methodology. Proc. 2nd Int. Conf. Web Intelligence, Mining and Semantics, Craiova, Romania, June 13–15, Article No. 41. ACM New York, NY, USA. 11 Horaud, R., Forbes, F., Yguel, M., Dewaele, G. and Zhang, G. ( 2012) Rigid and articulated point registration with expectation conditional maximization. IEEE. Trans. Pattern. Anal. Mach. Intell. , 33, 587– 602. Google Scholar CrossRef Search ADS   12 Huang, Z., Tsang, D.H.K. and She, J. ( 2012) A Virtual Machine Consolidation Framework for MapReduce Enabled Computing Clouds. Proc. 24th Int. Teletraffic Congress (ITC), September 4–7, pp. 1–8. IEEE. 13 Liu, X., Wang, C., Zhou, B.B., Chen, J., Yang, T. and Zomaya, A.Y. ( 2013) Priority-based consolidation of parallel workload in the cloud. IEEE Trans. Parallel Distrib. Syst. , 24, 1874– 1883. Google Scholar CrossRef Search ADS   14 Mastroianni, C., Meo, M. and Papuzzo, G. ( 2013) Probabilistic consolidation of virtual machines in self-organizing cloud data centers. IEEE Trans. Cloud Comput. , 1, 215– 228. Google Scholar CrossRef Search ADS   15 Beloglazov, A. and Buyya, R. ( 2013) Managing overloaded hosts for dynamic consolidation of virtual machines in cloud data centers under quality of service constraints. IEEE Trans. Parallel Distrib. Syst. , 24, 1366– 1379. Google Scholar CrossRef Search ADS   16 Nevithitha, S. and Sriram, V.S.S. ( 2013) Consolidated batch and transactional workloads using dependency structure prioritization. Int. J. Eng. Technol. (IJET) , 5, 1328– 1334. 17 Singh, A.N. and Hemalatha, M. ( 2013) Cluster based bee algorithm for virtual machine placement in cloud data center. J. Theor. Appl. Inf. Technol. , 57, 1– 10. 18 Mauch, V., Kunze, M. and Hillenbrand, M. ( 2013) High performance cloud computing. Future Gener. Comput. Syst. , 29, 1408– 1416. Google Scholar CrossRef Search ADS   19 Lin, J., Zha, L. and Xu, Z. ( 2013) Consolidation cluster systems for data centers in the cloud age: a survey and analysis. Front. Comput. Sci. , 7, 1– 19. Google Scholar CrossRef Search ADS   20 Soltanmohammadi, E. and Naraghi-Pour, M. ( 2013) Blind modulation classification over fading channels using expectation maximization. IEEE Commun. Lett. , 17, 1692– 1695. Google Scholar CrossRef Search ADS   21 Lichman, M. ( 2013) UCI machine learning repository. http://archive.ics.uci.edu/ml University of California, School of Information and Computer Science, Irvine, CA. 22 Jin, R., Kou, C., Liu, R. and Li, Y. ( 2013) Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. J. Cloud Comput. Adv. Syst. Appl. , 2, 1– 10. Google Scholar CrossRef Search ADS   23 Corradi, A., Fanelli, M. and Foschini, L. ( 2014) VM consolidation: a real case based on OpenStack cloud. Future Gener. Comput. Syst. , 32, 118– 127. Google Scholar CrossRef Search ADS   24 Kumar, S., Manvi, S. and Shyam, G.K. ( 2014) Resource management for infrastructure as a service (Iaas) in cloud computing: a survey. J. Netw. Comput. Appl. , 41, 424– 440. Google Scholar CrossRef Search ADS   25 Hsu, C., Slagter, K.D., Chen, S. and Chung, Y. ( 2014) Optimizing energy consumption with task consolidation in clouds. Inf Sci (Ny) , 258, 452– 462. Google Scholar CrossRef Search ADS   26 Jula, A., Sundararajan, E. and Othman, Z. ( 2014) Cloud computing service composition: a systematic literature review. J. Expert Syst. Appl. , 41, 3809– 3824. Google Scholar CrossRef Search ADS   27 Thakur, S., Kalia, A. and Thakur, J. ( 2014) Performance evaluation of server consolidation algorithms in virtualized cloud environment with constant load. Int. J. Adv. Res. Comput. Sci. Softw. Eng. , 4, 555– 562. 28 Aiash, M., Mapp, G. and Gemikonakli, O. ( 2014) Secure Live Virtual Machines Migration: Issues and Solutions. Proc. 28th Int. Conf. Advanced Information Networking and Applications Workshops, Victoria, Canada, May 13–16, pp. 160–165. IEEE. 29 He, L., Zou, D., Zhang, Z., Chen, C., Jin, H. and Jarvis, S.A. ( 2014) Developing resource consolidation frameworks for moldable virtual machines in clouds. Future Gener. Comput. Syst. , 32, 69– 81. Google Scholar CrossRef Search ADS   30 Hao, M., Shi, W., Zhang, H. and Li, C. ( 2014) Unsupervised change detection with expectation maximization-based level set. IEEE Geosci. Remote Sens. Lett. , 11, 210– 214. Google Scholar CrossRef Search ADS   31 Esfandiarpoor, S., Pahlavan, S. and Goudarzi, M. ( 2015) Structure-aware online virtual machine consolidation for datacenter energy improvement in cloud computing. Comput. Electr. Eng. , 42, 74– 89. Google Scholar CrossRef Search ADS   32 Shen, D., Luo, J., Dong, F., Fei, X., Wang, W., Jin, G. and Li, W. ( 2015) Stochastic modeling of dynamic right-sizing for energy-efficiency in cloud data centers. Future Gener. Comput. Syst. , 48, 82– 95. Google Scholar CrossRef Search ADS   33 Rao, K.S. and Thilagam, P.S. ( 2015) Heuristics based server consolidation with residual resource. Future Gener. Comput. Syst. , 50, 87– 98. Google Scholar CrossRef Search ADS   34 Ahmad, R.W., Gani, A., Hamid, S.H.A., Shiraz, M., Yousafzai, A. and Xia, F. ( 2015) A survey on virtual machine migration and server consolidation frameworks for cloud data centers. J. Netw. Comput. Appl. , 52, 11– 25. Google Scholar CrossRef Search ADS   35 Maiti, A. and Mukherjee, A. ( 2015) On the Monte-Carlo expectation maximization for finding motifs in DNA sequences. IEEE J. Biomed. Health Inform. , 19, 677– 686. Google Scholar CrossRef Search ADS PubMed  36 Stetco, A., Zeng, X. and Keane, J. ( 2015) Fuzzy C-means++: fuzzy C-means with effective seeding initialization. Expert Syst. Appl. , 42, 7541– 7548. Google Scholar CrossRef Search ADS   37 Xia, Y., Zhou, M.C., Luo, X., Zhu, Q., Li, J. and Huang, Y. ( 2015) Stochastic modeling and quality evaluation of infrastructure-as-a-service clouds. IEEE Trans. Autom. Eng. , 12, 162– 170. Google Scholar CrossRef Search ADS   38 Zhang, S., Qian, Z., Luo, Z., Wu, J. and Lu, W. ( 2016) Burstiness-aware resource reservation for server consolidation in computing clouds. IEEE Trans. Parallel Distrib. Syst. , 27, 964– 997. Google Scholar CrossRef Search ADS   39 Ismaeel, S. and Miri, A. ( 2016) Energy-Consumption Clustering in Cloud Data Center. Proc. 3rd MEC Int. Conf. Big Data and Smart City, Muscat, Oman, March 15–16, pp. 1–6. IEEE. 40 Garriga, J., Palmer, J.R.B., Oltra, A. and Bartumeus, F. ( 2016) Expectation-maximization binary clustering for behavioural annotation. J. PLoS One , 11, 1– 26. Google Scholar CrossRef Search ADS   Author notes Handling editor: Xiaohui Liu © The British Computer Society 2017. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com TI - A Resource Efficient Expectation Maximization Clustering Approach for Cloud JF - The Computer Journal DO - 10.1093/comjnl/bxx043 DA - 2018-01-01 UR - https://www.deepdyve.com/lp/oxford-university-press/a-resource-efficient-expectation-maximization-clustering-approach-for-qRxYyopF9D SP - 95 EP - 104 VL - 61 IS - 1 DP - DeepDyve ER -