TY - JOUR AU - Levi, Albert AB - Abstract The grid-based networks are formed by applications where objects being monitored form a square grid. These applications often demand critical security concerns as the compromise to the data yields adverse effects. There are indeed, several lightweight data aggregation schemes proposed in the literature that aims to minimize the resource overhead, albeit providing the required security attributes. However, as per our observations, the information about the actual deployment of the nodes is not exploited in any of these attempts. In this paper, we exploit the linearity in the deployment of grid-based networks to design an aggregation scheme that eventually entails lesser overhead as compared to the other existing schemes while offering the same level of resilience. As homomorphic encryption is considered to be more secure than obfuscation, we also propose a variant to the proposed scheme using homomorphic encryption. In the scenario with lesser restriction on computation, one can opt for the variant to the proposed scheme. The security analysis using mathematical induction and formal security proofs proves the security of the proposed scheme. To the best of our knowledge, the proposed scheme is the first that achieves privacy preservation, data verification, resilience against node capture and avoidance against collusion attacks in grid-based networks with lesser requirement of key storage and communication cost. 1. INTRODUCTION AND MOTIVATION The grid-based networks are formed by the deployment of sensor nodes in a square grid fashion. The applications where objects being monitored are distributed in a square grid, inherently form grid-based networks. Examples of the same are: monitoring the traffic level of city streets, monitoring trees in a forest project, monitoring goods in a warehouse, monitoring the electricity consumption of houses through sensor nodes or smart meters with the houses being laid in a row by row fashion. All the considered examples contain a collection of sensor nodes (meters) used for the purpose of monitoring the surroundings and reporting the data to the BS (Base Station) as discussed in [1, 2]. These nodes are constrained with regard to the computation capabilities, battery power, storage, bandwidth, etc. [3–5]. The typical example of nodes forming a square grid is as shown in Fig. 1. The example shows 50 houses of a colony deployed in a square grid manner. Each house has a sensor node (smart meter) whose job is to take data and forward it to neighbor node in an aggregated manner. The same example can be considered for any application forming a square grid of nodes. Aggregation is used to conserve the communication cost—instead of sending in N ( N is the total number of nodes in the network) different readings, an aggregated value is sent to conserve the bandwidth and reduce the communication traffic as discussed in [6]. However, as the data being communicated are exposed to intermediate nodes, security of the same becomes vital [7, 8]. An obvious way to safeguard the security of the data being communicated is to encrypt the data with the help of pairwise keys being shared by each pair of intermediate nodes [9–11]. The amount of storage required in those networks using pairwise key deployment would be of the order of O(N2) for a network consisting of N nodes. The survey on secure data aggregation schemes is discussed in [7] and the lightweight schemes are discussed in [12, 13]. However, these schemes do not take the deployment knowledge of sensor nodes or smart meters under consideration unlike the proposed scheme. Figure 1. View largeDownload slide Considered scenario of colony. Figure 1. View largeDownload slide Considered scenario of colony. The proposed scheme takes the advantage of having linearity in the deployment of grid-based networks i.e. considering only a single row of deployed houses, forms a straight line of 10 houses. The aggregation protocol keeping linearity of deployment in mind is designed in [14]. We extend this concept to grid-based networks and introduce privacy and verifiability of data, and avoidance of node collusion attacks in the proposed aggregation scheme. The compromised intermediate nodes reveal aggregated data that can be used for malicious purpose. Therefore, privacy of the data at intermediate nodes becomes important [8, 15, 16]. In privacy homomorphism [17, 18], the operations are applied on cipher texts and as decryption is not required before processing the data, the privacy is preserved. However, an active adversary who has compromised an intermediate node can alter the encrypted value. Therefore, the aggregated result received by the aggregator node or the BS is wrong as the homomorphic encryption is malleable [19]. Hence, verification of aggregated data becomes vital, and it is challenging to achieve the same as discussed in [20]. Homomorphic Message Authentication Code (HMAC) [21] is used in [22] for malleability resilient concealed data aggregation. However, they provide only end-to-end verification, which is a problem as we have to wait for checking the integrity till the message is reached at the BS. The proposed scheme provides the solution to this by using already distributed keys between nodes for computing Message Authentication Code (MAC) values and verifying the same at intermediate nodes as well as at the BS. This saves energy of nodes as the altered messages are not communicated to the BS. The proposed scheme provides resilience against node capture attacks (the adversary captures the nodes or meters physically). The resilience is provided to a predefined value k, where k is the number of consecutive captured nodes. The proposed scheme is dynamic as it adjusts itself according to the value k and the distribution of keys is based on it. Value of k is pre-decided by the system designer. It depends on the amount of resilience the system designer believes is sufficient to handle node captures. We assume that an adversary captures nodes in consecutive manner only, as it has an advantage of disrupting the network by capturing lesser number of nodes as discussed in [14, 23, 24]. As shown with our mathematical analysis and associated proofs, the proposed scheme is resilient against node capture attacks. The proposed scheme requires only two keys per node for providing the same level of resilience as provided by the other existing schemes. The collusion attacks are avoided by adding random shares to the actual value. The analysis of the number of messages required to be communicated (Section 6.1) shows that the proposed scheme requires lesser number of messages to provide the same level of security as other schemes. As the homomorphic encryption is considered to be more secure than the usage of random shares, we come up with the variant to the proposed scheme that uses homomorphic encryption. However, it requires one more key and more computation for the purpose of encryption. Therefore, in the scenario with lesser restriction on computation and storage of keys, one can opt for the variant to the proposed scheme. To the best of our knowledge, the proposed data aggregation scheme is the first to achieve privacy preservation, verification of data at intermediate nodes as well as at the BS, resilience against node captures and avoidance against collusion attacks in grid-based networks with lesser requirement of key storage and communication cost. 1.1. Our contributions The goal of our work is to design a secure data aggregation scheme for grid-based networks with lesser requirements of keys storage and communication cost than all the other existing schemes described in Table 1 in Section 3. An observation of having linearity in the deployment of the grid-based networks makes the proposed scheme different from the other existing schemes. The major contributions of the paper are as follows: Pairwise KPS (Key Predistribution Scheme) as discussed in [9–11] requires each node to share keys with all the other nodes i.e. in the network of N nodes, each node shares pairwise keys with (N−1) nodes. Therefore, the number of keys required for the network is O(N2). KPS inspired from costas array as discussed in [38] requires each node to store the considered size of the costas array as the number of keys. Hierarchical grid-based KPS as discussed in [39, 40] requires logN number of keys per node. The ultra-lightweight KPS discussed in [23] requires each node to store four keys per node. The proposed aggregation scheme requires each node to store two keys which yields O(1) the number of keys to provide the same level of resilience as existing schemes. Therefore, the proposed scheme minimizes the key storage requirement. The KPS used for the proposed aggregation scheme is inspired from the scheme proposed by us in [24]. Sharing pairwise keys with aggregator nodes and providing end-to-end verification [25, 28, 35] is simpler than the verification provided in the proposed scheme. However, it costs unnecessary communication overhead in the network as we have to wait till aggregator nodes verify the data. The proposed aggregation scheme avoids this unnecessary communication overhead by providing hop-by-hop verification. Therefore, the proposed scheme saves the energy of nodes and improves the lifetime of the network by providing hop-by-hop verification and avoiding unnecessary communication in the network. The proposed scheme and the variant (using homomorphic encryption) provide the privacy and integrity of data with lesser requirement of key storage and communication overhead than existing schemes. We prove the security of the proposed scheme with the help of formal security proof and data indistinguishability game. The proposed scheme is resilient against the node capture attacks unlike the schemes discussed in [7, 8]. Our mathematical analysis formally proves the resilience of the proposed aggregation scheme against such attack where nodes are captured physically to disrupt the service. The data aggregation schemes discussed in [14, 41] are not considering collusion attacks. However, the proposed scheme avoids node collusion attacks i.e. nodes cannot collude and infer information that they are not supposed to infer individually. Table 1. Summary of existing privacy preserving solutions in Smart Grid. Technique/idea used  Reference no.  Observations  Paillier’s Homomorphic Encryption Scheme  [25, 26]  Privacy is guaranteed Amount of computation required to apply homomorphic encryption and decryption is not discussed Homomorphic encryption is inherently malleable HMAC can be used for malleability resistance [27]  Anonymization  [28]  Simple and requires less communication Identity collision for credentials is unavoidable  Blinding Factor  [29]  Privacy enhanced data aggregation through blinded data Requires off-line Trusted Third Party (TTP)  LSM (Load Signature Moderation)  [30]  Rechargeable battery along with the power mixing algorithm is used Smart grid functionality is not considered Cost effectiveness of the same idea is discussed in [12]  3rd Party Escrow  [31]  Uses two identifiers: high frequency and low frequency High frequency IDs are not disclosed Feasibility and cost associated for having two IDs is not discussed  Intelligent computer software  [32]  Uses privacy manager Privacy manager can use the defined features such as pseudonymity for providing privacy Only a theoretical model is proposed  Proxy Agent  [33]  Smart Grid scenario is considered as client-server model Proxy agent is used who will collect, anonymize and process data General idea not focused on smart grid scenario  Masking  [34]  Deals with securely aggregating meter reading The provider learns no information besides the aggregate Masking of data helps in achieving privacy  Privacy preserving authentication  [35]  Temper resilient device is attached for the registered users Device generates pseudo identities and signatures on messages Delay in message authentication is the major issue  Probability distribution and data mining  [36]  Empirical Probability Distribution is used Analysis of content of the power signal from the viewpoint of privacy is done Different probability distributions are not considered  Perturbation  [37]  Perturbation guarantees privacy in meter measurement and utility in perturbed value Provides merely a theoretical framework  Technique/idea used  Reference no.  Observations  Paillier’s Homomorphic Encryption Scheme  [25, 26]  Privacy is guaranteed Amount of computation required to apply homomorphic encryption and decryption is not discussed Homomorphic encryption is inherently malleable HMAC can be used for malleability resistance [27]  Anonymization  [28]  Simple and requires less communication Identity collision for credentials is unavoidable  Blinding Factor  [29]  Privacy enhanced data aggregation through blinded data Requires off-line Trusted Third Party (TTP)  LSM (Load Signature Moderation)  [30]  Rechargeable battery along with the power mixing algorithm is used Smart grid functionality is not considered Cost effectiveness of the same idea is discussed in [12]  3rd Party Escrow  [31]  Uses two identifiers: high frequency and low frequency High frequency IDs are not disclosed Feasibility and cost associated for having two IDs is not discussed  Intelligent computer software  [32]  Uses privacy manager Privacy manager can use the defined features such as pseudonymity for providing privacy Only a theoretical model is proposed  Proxy Agent  [33]  Smart Grid scenario is considered as client-server model Proxy agent is used who will collect, anonymize and process data General idea not focused on smart grid scenario  Masking  [34]  Deals with securely aggregating meter reading The provider learns no information besides the aggregate Masking of data helps in achieving privacy  Privacy preserving authentication  [35]  Temper resilient device is attached for the registered users Device generates pseudo identities and signatures on messages Delay in message authentication is the major issue  Probability distribution and data mining  [36]  Empirical Probability Distribution is used Analysis of content of the power signal from the viewpoint of privacy is done Different probability distributions are not considered  Perturbation  [37]  Perturbation guarantees privacy in meter measurement and utility in perturbed value Provides merely a theoretical framework  Table 1. Summary of existing privacy preserving solutions in Smart Grid. Technique/idea used  Reference no.  Observations  Paillier’s Homomorphic Encryption Scheme  [25, 26]  Privacy is guaranteed Amount of computation required to apply homomorphic encryption and decryption is not discussed Homomorphic encryption is inherently malleable HMAC can be used for malleability resistance [27]  Anonymization  [28]  Simple and requires less communication Identity collision for credentials is unavoidable  Blinding Factor  [29]  Privacy enhanced data aggregation through blinded data Requires off-line Trusted Third Party (TTP)  LSM (Load Signature Moderation)  [30]  Rechargeable battery along with the power mixing algorithm is used Smart grid functionality is not considered Cost effectiveness of the same idea is discussed in [12]  3rd Party Escrow  [31]  Uses two identifiers: high frequency and low frequency High frequency IDs are not disclosed Feasibility and cost associated for having two IDs is not discussed  Intelligent computer software  [32]  Uses privacy manager Privacy manager can use the defined features such as pseudonymity for providing privacy Only a theoretical model is proposed  Proxy Agent  [33]  Smart Grid scenario is considered as client-server model Proxy agent is used who will collect, anonymize and process data General idea not focused on smart grid scenario  Masking  [34]  Deals with securely aggregating meter reading The provider learns no information besides the aggregate Masking of data helps in achieving privacy  Privacy preserving authentication  [35]  Temper resilient device is attached for the registered users Device generates pseudo identities and signatures on messages Delay in message authentication is the major issue  Probability distribution and data mining  [36]  Empirical Probability Distribution is used Analysis of content of the power signal from the viewpoint of privacy is done Different probability distributions are not considered  Perturbation  [37]  Perturbation guarantees privacy in meter measurement and utility in perturbed value Provides merely a theoretical framework  Technique/idea used  Reference no.  Observations  Paillier’s Homomorphic Encryption Scheme  [25, 26]  Privacy is guaranteed Amount of computation required to apply homomorphic encryption and decryption is not discussed Homomorphic encryption is inherently malleable HMAC can be used for malleability resistance [27]  Anonymization  [28]  Simple and requires less communication Identity collision for credentials is unavoidable  Blinding Factor  [29]  Privacy enhanced data aggregation through blinded data Requires off-line Trusted Third Party (TTP)  LSM (Load Signature Moderation)  [30]  Rechargeable battery along with the power mixing algorithm is used Smart grid functionality is not considered Cost effectiveness of the same idea is discussed in [12]  3rd Party Escrow  [31]  Uses two identifiers: high frequency and low frequency High frequency IDs are not disclosed Feasibility and cost associated for having two IDs is not discussed  Intelligent computer software  [32]  Uses privacy manager Privacy manager can use the defined features such as pseudonymity for providing privacy Only a theoretical model is proposed  Proxy Agent  [33]  Smart Grid scenario is considered as client-server model Proxy agent is used who will collect, anonymize and process data General idea not focused on smart grid scenario  Masking  [34]  Deals with securely aggregating meter reading The provider learns no information besides the aggregate Masking of data helps in achieving privacy  Privacy preserving authentication  [35]  Temper resilient device is attached for the registered users Device generates pseudo identities and signatures on messages Delay in message authentication is the major issue  Probability distribution and data mining  [36]  Empirical Probability Distribution is used Analysis of content of the power signal from the viewpoint of privacy is done Different probability distributions are not considered  Perturbation  [37]  Perturbation guarantees privacy in meter measurement and utility in perturbed value Provides merely a theoretical framework  1.2. Organization of the Paper The rest of the paper is organized as follows: in Section 2, we look at the background related with the proposed scheme. The related work for the proposed data aggregation scheme is discussed in Section 3. In Section 4, we look at the system model and adversary model for the proposed scheme. Section 5 discusses the proposed data aggregation scheme. In Section 6, the analysis regarding the communication overhead and security is done. In addition, the theoretical and mathematical analysis along with associated theorems and proofs are discussed in this section. We summarize our work with conclusions in Section 7 along with the future directions. 2. BACKGROUND Secure data aggregation in grid-based networks can be achieved through homomorphic encryption [42–44], as there is no need to decrypt the data at intermediate nodes. Homomorphic encryption works with the ciphertext data and the operations applied yield the same results when applied on plaintext data. The idea of privacy homomorphism is proposed by Rivest et al [45]. that showed how operations such as addition and multiplication are carried out on encrypted data without losing privacy. The use of the homomorphic scheme proposed in [17] for operations over encrypted data at intermediate nodes is discussed in [46, 47]. These schemes provide concealed data aggregation (where privacy homomorphism property is used for getting data aggregation in ciphertext domain). The work related with privacy preserving secure data aggregation using homomorphic encryption is based on either symmetric key [15, 17, 42, 43, 48] or asymmetric key [44, 49, 50] The variant to our proposed scheme uses an idea of homomorphic encryption. We substitute obfuscation to homomorphic encryption in the variant to the proposed scheme. The homomorphic encryption requires more computation cost (hence more energy) than obfuscation but provides more security. Therefore, based on the application whether it requires more security or not, and based on the amount of computation capability (energy) nodes have, one can choose either obfuscation based or homomorphic based approach for providing privacy. The considered monitoring applications require the integrity of messages as any change in the actual value by intermediate nodes makes the final result wrong. There are several solutions [22, 51–53] that provide integrity of data in Wireless Sensor Networks (WSNs). We provide the integrity of data through keyed MAC function that is resistant to collision attacks. 3. RELATED WORK Security in WSNs is important and it is a well-researched topic. Protocols presented in [54] discuss how security is achieved in resource constrained environment (such as WSNs) in detail. Alternative idea for providing security with lesser overhead concerning the energy, latency and bandwidth is discussed in [55]. They introduced security architecture at link layer. There are several challenges for achieving security and the same are discussed in details in [56, 57]. We focus on designing the data aggregation scheme that takes care of security with regard to the privacy and verification of data and provide resilience against node capture attack in grid-based networks. An overview and survey of secure data aggregation in WSNs are discussed in [7, 58]. These schemes provide security with regard to the message authentication and replay protection but as the deployed environment is hostile; they are less dependable. Moreover, intermediated nodes can also alter or monitor the data being communicated through them, which compromises the privacy and the integrity of data. Therefore, data aggregation protocols that take care of privacy and integrity of data as discussed in [15, 46, 59–61] are important. The smart grid application is taken as an example scenario; to explain why meters require secure aggregation, integrity and resilience in grid-based network. However, the proposed scheme is applicable to any application forming a grid-based network. We consider the case where smart meters (nodes) are deployed in such a way that forms a grid-based network. The consumers connected to a smart grid through the smart meters hanging outside their houses do not want their day to day life to be exposed to other people. This requirement makes privacy preservation compulsory in smart grid. The privacy plays a major role in smart grid as, Data being communicated reveals consumer’s usage of energy. By analysing the pattern of data, an adversary can decide whether the house owner is present in the house which is a serious breach to the privacy. The knowledge of when a particular user is operating which appliance can also be used by thieves for wrong purpose [62]. Data can be used by interested parties to get the personal information such as habits, activities and beliefs [63–65]. Through Non-intrusive Appliance Load Monitoring (NALM) many serious breaches to the privacy can take place [66, 67]. The summary of all the privacy preserving aggregation approaches for smart grid is shown in Table 1. All the solutions discussed in Table 1 have certain advantages and limitations. The same are discussed in the Observations column of Table 1. We propose a solution that is inspired from the idea of obfuscation as it is computationally less expensive than homomorphic encryption. The dictionary meaning of the word obfuscation is “the obscuring of intended meaning in communication, making the message confusing, will-fully ambiguous, or harder to understand”. The concept is to lie about the actual value willingly in order to make it private. If we consider the scenario of WSNs (smart grid), then the idea is to make sure that none of the intermediate nodes (meters) understands the actual data, but the BS can obtain the actual value from the obfuscated one. 4. SYSTEM MODEL AND ADVERSARY MODEL In this section, we define the system model taken under consideration for the proposed data aggregation scheme. The performance capabilities of each node (meter) and the assumptions involved in designing the proposed aggregation scheme are also described. Moreover, we define the adversary model considered for the proposed scheme. 4.1. System model We consider the applications that are inherently placing nodes (smart meters) in square grid form. We assume that each node can at least communicate till one hop (i.e. each node can send the data at least to its direct neighbor). The entire study on the efficient wireless communication in smart grid is discussed in [68, 69]. The study shows that the communication in smart grid is possible through 3, 4G, IEEE 802.11, Zigbee, Wi-Fi, power-lines, WiMAX (Worldwide Interoperability for Microwave Access), etc. We do not consider wired communication in our study because only a cut on the wires is enough for the adversary to disrupt the entire service. Moreover, it is complex to design a wired structure for the applications we consider. We assume that the BS is capable of generating and communicating secret shares (ri's) to all the nodes or smart meters of the network. The shares are generated based on the concept of secret sharing scheme proposed in [70]. An idea of secret sharing scheme is to distribute shares among the group of users; these shares are of no use if used individually as the only way to recreate secret is by combining all the shares. The number of shares depends on the total number of nodes or meters in the network ( r1,r2,r3,..,rN shares are generated if total number of nodes are N). These shares are used for obfuscating the actual data (di's). The actual data (di's) sensed by nodes are assumed to be within a predefined range of possible data values i.e. d1,…,di∈{0,…,D-1} (D is the maximum value allowed for data value). Upon receiving the obfuscated aggregate value, the BS subtracts the random value ( r=r1+r2+r3+…+rN) that is used for generating secret shares. We assume that the nodes are preloaded with two symmetric keys. One is for calculating the MAC value and this key is shared with a node at distance (k+1) ( k is the number of consecutive nodes that can be captured). Another key is used for the purpose of verifying the MAC value. In order to provide resilience against k captured nodes with lesser key storage requirement, the symmetric key used in MAC calculation is shared with (k + 1)th node. Nodes can lie about their own data but any attempt to change other genuine nodes’ data is detected by the scheme. We assume that the keyed hash MAC functions that are generated by sensor nodes are collision resistant. For the variant to the proposed scheme, one more key is stored in each smart meter that is shared only with the BS which is used for the purpose of homomorphic encryption and decryption. The transmission medium is considered to be reliable i.e. the transmitted packets are not lost. 4.2. Adversary model We consider two kinds of adversaries. 4.2.1. Active The adversaries those are capable of altering, adding and deleting the data being communicated are termed as active. They are classified as the following: Insider: These adversaries are the one with complete access to a legitimate node. They are also known as byzantine adversary [71]. Outsider: These adversaries are not the legitimate but they play their role in aggregation process by introducing false data. We use keyed hash MAC to avoid active adversaries as discussed in the proposed scheme in Section 5. 4.2.2. Passive The adversaries those are solely monitoring the data being communicated without altering them are known to be passive. Their job is to deduce something (secret key or important information breaching the privacy of user) from the data being communicated in the network. We use obfuscation or homomorphic encryption mechanisms to avoid passive adversaries as discussed in the proposed scheme in Section 5. We consider the following attacks and will prove the countermeasures against the same in Section 6.2. Node capture attack: In this attack, the adversary captures the nodes physically. It is hard to completely secure the network against this attack. However, the proposed scheme provides resilience against a predefined value k, where k is the number of consecutive malicious nodes in the network. The resilience against node capture attack is proved through Theorem 1 in Section 6.2. Collusion attacks: In this attack, nodes of the network collude in order to infer the data they are not supposed to i.e. if there are 10 nodes in the network and 9 of them collide to get the information of 10th node. The resilience against node collusion attack is proved through Theorem 2 in Section 6.2. The meter indistinguishability is proved through a security game and Theorem 3 in Section 6.2. The attacks where a node starts lying about its own reading have a limited effect compared to changing the aggregated results as discussed in [72, 73]. Therefore, we do not consider such value changing attacks. The work addressing these attacks is discussed in [74]. Moreover, denial of service attacks, packet dropping attacks, communication channel jamming attacks are also not considered in our work. Putting an upper bound the packet transmission time and running time of the protocol can help in avoiding such attacks. The work on handling such attacks is discussed in [75]. 5. THE PROPOSED SCHEME In this section, we describe the proposed aggregation scheme that provides privacy, verifiability and resilience against node capture and collusion attacks. As the active attackers are capable of modifying the ciphertext values (malleability), there is a need of verifiability at intermediate nodes to avoid such attacks. The proposed scheme provides the same through MAC verification at intermediate nodes. This increases the computation cost as intermediate nodes have to perform verification but it helps in reducing the communication cost as alteration in the actual value would be caught by intermediate nodes only. We do not have to wait till the values reach the BS. This helps in utilizing the energy of nodes as well as avoiding the communication involved in forwarding maliciously altered data. The energy consumption is much more in communication than computation [13, 68] in WSN. Therefore, our focus is also in minimizing the communication cost for utilizing the energy of nodes. The notations used in the proposed scheme are described in Table 2. Table 2. Notations used in proposed aggregation scheme. Symbol  Description  i  Node (meter) ID  mi  ith node (meter)  N  Total number of nodes in the network  k  Number of consecutive nodes captured  di  Plaintext data sensed by node i  ri  Random share given to node i  Xi  Aggregated data calculated by adding previous node’s aggregate (X (i−1)), ith node data (di) and random share of ith node (ri)  ti,j + k + 1  MAC value generated on encrypted aggregate value (Xi) using a key shared between node i and (i + k + 1)  MACki,i+k+1(Xi)  MAC value of encrypted value X′i  EHK(Xi)  Homomorphic Encryption of value Xi (Secret Key)  EHKu(Xi)  Homomorphic Encryption of value Xi (Public Key)  Symbol  Description  i  Node (meter) ID  mi  ith node (meter)  N  Total number of nodes in the network  k  Number of consecutive nodes captured  di  Plaintext data sensed by node i  ri  Random share given to node i  Xi  Aggregated data calculated by adding previous node’s aggregate (X (i−1)), ith node data (di) and random share of ith node (ri)  ti,j + k + 1  MAC value generated on encrypted aggregate value (Xi) using a key shared between node i and (i + k + 1)  MACki,i+k+1(Xi)  MAC value of encrypted value X′i  EHK(Xi)  Homomorphic Encryption of value Xi (Secret Key)  EHKu(Xi)  Homomorphic Encryption of value Xi (Public Key)  Table 2. Notations used in proposed aggregation scheme. Symbol  Description  i  Node (meter) ID  mi  ith node (meter)  N  Total number of nodes in the network  k  Number of consecutive nodes captured  di  Plaintext data sensed by node i  ri  Random share given to node i  Xi  Aggregated data calculated by adding previous node’s aggregate (X (i−1)), ith node data (di) and random share of ith node (ri)  ti,j + k + 1  MAC value generated on encrypted aggregate value (Xi) using a key shared between node i and (i + k + 1)  MACki,i+k+1(Xi)  MAC value of encrypted value X′i  EHK(Xi)  Homomorphic Encryption of value Xi (Secret Key)  EHKu(Xi)  Homomorphic Encryption of value Xi (Public Key)  Symbol  Description  i  Node (meter) ID  mi  ith node (meter)  N  Total number of nodes in the network  k  Number of consecutive nodes captured  di  Plaintext data sensed by node i  ri  Random share given to node i  Xi  Aggregated data calculated by adding previous node’s aggregate (X (i−1)), ith node data (di) and random share of ith node (ri)  ti,j + k + 1  MAC value generated on encrypted aggregate value (Xi) using a key shared between node i and (i + k + 1)  MACki,i+k+1(Xi)  MAC value of encrypted value X′i  EHK(Xi)  Homomorphic Encryption of value Xi (Secret Key)  EHKu(Xi)  Homomorphic Encryption of value Xi (Public Key)  5.1. The proposed aggregation scheme The aggregation scheme works in two phases. Initialization phase BS selects any random value r, generates secret shares r1,r2,..,rN and distributes the same to N nodes respectively. Summation of these shares yields the random value r. These shares are used for obfuscating the data. Pairwise symmetric key is preloaded between nodes i and j those are at distance k+1 (i.e. if number of consecutive captured nodes allowed is k=2, then pairwise symmetric keys are shared between Nodes 1 and 4( K1,4), 2 and 5 ( K2,5),…, N−2 and N( KN−2,N), respectively). We use symmetric key ciphers for the purpose of encryption and decryption. If the keys are used for the longer duration without refreshing the same, the probability of compromising the data is more as attack such as man in the middle is possible. Therefore, we must change the keys periodically. This period depends on parameters such as: key length, the resources possessed by an adversary, type of the cipher and the security requirement of the considered system or application. We consider the scenario of smart grid where meters are taking the readings of electricity consumption and sending them to the BS. The BS is responsible for further analysis and processing of received readings which is done every 24 h for the data that are communicated less frequently [31]. Therefore, we run the initialization phase every 24 h in order to refresh the keys. Here, 24 h is not the delay but the period after which the initialization phase is called again. However, there is no strict restriction on keeping it to 24 h and we can change this duration and set it according to the security requirements of the considered system or application. How to decide the ideal period for refreshing the keys is not the focus of our work. However, how the security is provided if the keys and random shares are compromised during the selected period is the focus of our work. The same is explained through the security proofs discussed in Section 6.2 of the paper (Theorems 1, 2 and 3). The BS can call the initialization phase at any point if the need arises. E.g. if the value of k or N changes. Working phase The first node ( n1) adds its secret share ( r1) to its data value ( d1). This results in an obfuscated value X1. From the second node onwards, each node adds its secret share ( ri) to its data value ( di). This result is then added to the aggregated value received from previous node ( Xi−1). This results in aggregated value Xi which is in the obfuscated form (as we have added the secret share ri at each node). Each node generates a MAC value on Xi by using symmetric pairwise key shared between nodes i and i+k+1. This results in a hashed value ti,i+k+1. Each node forwards its own aggregated value ( Xi), hashed value ti,i+k+1, the previous k nodes obfuscated aggregate values Xi−1,Xi−2,..,Xi−k and previous k hashed values ti−1,i+k,ti−2,i+k−1,…ti−k,i+1. The example of the proposed scheme with the total number of meters equals to 6 ( N=6) and consecutive captured nodes equals to 2 ( k=2) is shown in Fig. 2. The entire flow of the working of the proposed data aggregation scheme by considering k=2 is as shown in Fig. 3. Figure 2. View largeDownload slide Example of proposed aggregation scheme with k=2. Figure 2. View largeDownload slide Example of proposed aggregation scheme with k=2. Figure 3. View largeDownload slide Proposed aggregation scheme with k=2. Figure 3. View largeDownload slide Proposed aggregation scheme with k=2. The scheme works fine for the first ( N−(k+1)) nodes i.e. for the considered example of 6 nodes, scheme works perfectly for first 3 ( 6−(2+1)) nodes. If adversary captures last (k+1) nodes, then there is a problem as we are not generating the MAC values for them i.e. if adversary captures Nodes 4, 5 and 6 from the given example, then the aggregation scheme fails as the BS receives wrong values without knowing that they are maliciously altered. In order to deal with this problem, we introduce keys for the last (k+1) nodes i.e. Nodes 4, 5 and 6 share keys with the BS directly, they generate MAC with those keys and BS verifies the same. This guarantees that any alteration in data (by not more than k adversary) from anywhere in the network will be detected through the proposed data aggregation scheme. The example shown in Fig. 4, takes care of the problem related with adversary capturing last (k+1) nodes. Figure 4. View largeDownload slide Example of proposed aggregation scheme taking care of last k+1 nodes. Figure 4. View largeDownload slide Example of proposed aggregation scheme taking care of last k+1 nodes. The step that is required to be added in the initialization phase of the proposed scheme to accommodate the change described in Fig. 4 is as follows: (Steps 1 and 2 remain the same) 3. Last k+1 nodes are preloaded with a pairwise secret key that is shared with the BS (i.e. if N=6 and k=2, then Nodes 4, 5 and 6 share a pairwise secret key with BS denoted as K4,BS, K5,BS and K6,BS respectively). The step that is required to be added in the working phase of the proposed scheme to accommodate the change described in Fig. 4 is as follows: (Steps 1, 2, 3 and 4 remain the same) 5. The BS verifies the MAC values of last k+1 nodes (i.e. if N=6 and k=2, then BS verifies t4,BS, t5,BS, t6,BS) Table 3 is an example of WSN with real values for the sensor readings. We apply RC5 cipher [76] to generate the keyed MAC. It takes 32, 64 or 128 bits as a block size. The size of the key is between 0 and 2040 bits. However, for the purpose of understanding how the proposed scheme works on real values, we consider small key sizes. One can change the choice of the MAC generation algorithm and the size of the key according to his security requirements. Table 3. Illustrative example with real values. Reading ( di)  Random shares ( ri)  Predistributed keys ( Ki,j)  Aggregated values ( Xi)  MAC ( ti,j)  d1 = 201  r1 = 522  K1,4 = 11  X1 = 723  t1,4 = db5e62b2  d2 = 401  r2 = 127  K2,5 = 22  X2 = 1251  t2,5 = a1192047  d3 = 197  r3 = 600  K3,6 = 115  X3 = 2048  t3,6 = 982ea940  d4 = 597  r4 = 50  K4,BS = 99  X4 = 2695  t4,BS = 3146d79f  d5 = 101  r5 = 5  K5,BS = 412  X5 = 2801  t5,BS = c516a9d0  d6 = 51  r6 = 902  K6,BS = 62  X6 = 3754  t6,BS = 850f40ac  Reading ( di)  Random shares ( ri)  Predistributed keys ( Ki,j)  Aggregated values ( Xi)  MAC ( ti,j)  d1 = 201  r1 = 522  K1,4 = 11  X1 = 723  t1,4 = db5e62b2  d2 = 401  r2 = 127  K2,5 = 22  X2 = 1251  t2,5 = a1192047  d3 = 197  r3 = 600  K3,6 = 115  X3 = 2048  t3,6 = 982ea940  d4 = 597  r4 = 50  K4,BS = 99  X4 = 2695  t4,BS = 3146d79f  d5 = 101  r5 = 5  K5,BS = 412  X5 = 2801  t5,BS = c516a9d0  d6 = 51  r6 = 902  K6,BS = 62  X6 = 3754  t6,BS = 850f40ac  Table 3. Illustrative example with real values. Reading ( di)  Random shares ( ri)  Predistributed keys ( Ki,j)  Aggregated values ( Xi)  MAC ( ti,j)  d1 = 201  r1 = 522  K1,4 = 11  X1 = 723  t1,4 = db5e62b2  d2 = 401  r2 = 127  K2,5 = 22  X2 = 1251  t2,5 = a1192047  d3 = 197  r3 = 600  K3,6 = 115  X3 = 2048  t3,6 = 982ea940  d4 = 597  r4 = 50  K4,BS = 99  X4 = 2695  t4,BS = 3146d79f  d5 = 101  r5 = 5  K5,BS = 412  X5 = 2801  t5,BS = c516a9d0  d6 = 51  r6 = 902  K6,BS = 62  X6 = 3754  t6,BS = 850f40ac  Reading ( di)  Random shares ( ri)  Predistributed keys ( Ki,j)  Aggregated values ( Xi)  MAC ( ti,j)  d1 = 201  r1 = 522  K1,4 = 11  X1 = 723  t1,4 = db5e62b2  d2 = 401  r2 = 127  K2,5 = 22  X2 = 1251  t2,5 = a1192047  d3 = 197  r3 = 600  K3,6 = 115  X3 = 2048  t3,6 = 982ea940  d4 = 597  r4 = 50  K4,BS = 99  X4 = 2695  t4,BS = 3146d79f  d5 = 101  r5 = 5  K5,BS = 412  X5 = 2801  t5,BS = c516a9d0  d6 = 51  r6 = 902  K6,BS = 62  X6 = 3754  t6,BS = 850f40ac  The security for the aggregation scheme is proved mathematically. We use the idea of mathematical induction for proving the same. Theorem 1 in Section 6.2 describes the way resilience is incorporated through the scheme shown in Fig. 3. The aggregation scheme takes storage of 2 keys per node at maximum, which is lesser than the other schemes (discussed in [41]) providing the same level of resilience. The variant to the proposed scheme uses homomorphic encryption instead of the concept of obfuscation. It provides all the security features those are provided by proposed scheme. However, the usage of homomorphic encryption makes it computationally expensive. The example of the variant to the proposed scheme with N=6, is as shown in Fig. 5. As we can see from the figure, we use symmetric key homomorphic encryption ( EHK) as discussed in [42] instead of random shares. Here, the sum of symmetric keys used in homomorphic encryption by each node is K i.e. K1+K2+…+KN=K. The BS uses K for the purpose of decryption. Figure 5. View largeDownload slide Example of variant to the proposed aggregation scheme with k=2 (symmetric key). Figure 5. View largeDownload slide Example of variant to the proposed aggregation scheme with k=2 (symmetric key). The variant with public key homomorphic encryption is as shown in Fig. 6. Here, each node uses public key of BS for the purpose of homomorphic encryption ( EHKu). The BS ultimately decrypts the aggregated value with its private key. The security of the variant to the proposed scheme can be proved the same way as the proposed scheme. Figure 6. View largeDownload slide Example of variant to the proposed aggregation scheme with k=2 (public key). Figure 6. View largeDownload slide Example of variant to the proposed aggregation scheme with k=2 (public key). 6. THE COMMUNICATION & SECURITY ANALYSIS We analyse the number of messages required to be communicated for the proposed aggregation scheme in this section. Moreover, the analysis of security is discussed. The scheme uses the predistributed keys for the purpose of data integrity verification. The smart meters are resource rich compared to wireless sensor nodes. However, they are not completely resources rich as they have certain limitations with regard to the battery life, bandwidth, energy, etc. and are considered in the category of IoT (Internet of Things) devices [77, 78]. 6.1. Communication analysis In this section, we analyse the number of messages each node communicates to the next node. The cost that we consider for the protocol is dependent on the number of messages being communicated by every node. Our analysis shows that this cost relies directly on the total number of consecutive malicious nodes i.e. k. We assume each node communicates at least till single hop. 6.1.1. Local pairwise setting In local pairwise setting, each node shares a key with all the other nodes in the network. If we apply the same setting for data aggregation scheme, then the job of each node ni is to send the current aggregate total and MAC values for (k+1) nodes at distance till (k+1) nodes. Therefore, node ni computes and sends aggregate Xi and k number of MACs on Xi to the neighbor node. This gives a total of (k+2) messages computed and being sent by every node. Along with the messages node ni computes and sends, it is also responsible for forwarding MAC values from the previous k hops. The values required for forwarding depends on the value of k. The values required for forwarding are incremented till (k+1) as shown in the following equation:   2+3+⋯+(k+1)=(k+1)(k+2)2−1=k2+3k2 (1) Thus, the communication cost regarding the total number of messages for node ni is   k2+5k+42 (2) 6.1.2. Our proposed scheme setting In the proposed scheme, we do not share key with every other node in the network. Each node has 2 keys at the maximum. One is used for the purpose of MAC generation and other for the verification as discussed already in Section 5. The number of messages communicated in proposed scheme is still dependent on the value k. However, it reduces to the value O(2k) compared with O(k2) required in local pairwise setting and O(4k) in ultralight-weight setting discussed in [41]. The number of messages required to be communicated for the proposed scheme are 2∗(k+1). Moreover, the proposed scheme takes care of collusion attacks, both insider and outsider adversaries and provides verifiability that the existing schemes do not consider. The variant of the proposed scheme that uses homomorphic encryption also requires O(2k) number of messages per node. The comparison of the aggregation schemes with regard to the communication of number of messages is as shown in Table 4. Table 4. Comparison of communication cost of aggregation schemes. Aggregation scheme  Communication cost (per node)  Local PairWise Scheme  O(k2)  Ultra-lightweight Scheme proposed in [41]  O(4k)  Our Proposed Scheme  O(2k)  Variant to Our Proposed Scheme  O(2k)  Aggregation scheme  Communication cost (per node)  Local PairWise Scheme  O(k2)  Ultra-lightweight Scheme proposed in [41]  O(4k)  Our Proposed Scheme  O(2k)  Variant to Our Proposed Scheme  O(2k)  Table 4. Comparison of communication cost of aggregation schemes. Aggregation scheme  Communication cost (per node)  Local PairWise Scheme  O(k2)  Ultra-lightweight Scheme proposed in [41]  O(4k)  Our Proposed Scheme  O(2k)  Variant to Our Proposed Scheme  O(2k)  Aggregation scheme  Communication cost (per node)  Local PairWise Scheme  O(k2)  Ultra-lightweight Scheme proposed in [41]  O(4k)  Our Proposed Scheme  O(2k)  Variant to Our Proposed Scheme  O(2k)  6.2. Security analysis The proposed aggregation scheme avoids both the active and passive attacks. The countermeasure against the security attacks defined in Section 4.2 is proved with help of following theorems. Theorem 1 If the protocol shown in Fig. 3 is applied to smart grid data aggregation, at most k=2 consecutive meters are captured (dishonest), and that meters m1,…,mi−1 do not output reject. Then, mi either outputs reject, or correctly computes Xi=∑j=1idj+ri where d1,…,di∈{0,…,D−1}. Proof. In order to prove the theorem, we use the concept of the mathematical induction. Initially take i = 1,2,3,4 as base case as shown in Fig. 7. Figure 7. View largeDownload slide Base case. Figure 7. View largeDownload slide Base case. Now, assume all the honest meters mj for 1 ≤ j ≤ N can correctly compute Xj=∑l=1jdl+rj where dl∈{0,…,D−1}, as well as all the related MAC values i.e. tj−3,j, tj,j+3. Let us assume that meter mi−1 is honest, then by induction as shown in Fig. 8, the theorem can be proved. Figure 8. View largeDownload slide Induction case. Figure 8. View largeDownload slide Induction case. As we can see, if meters mi and mi+1 are compromised, then meter mi+2 successfully detects their malicious behavior by verifying the MAC generated by an honest meter mi−1. This makes sure that the scheme is resilient against value k, where k is the number of consecutive dishonest meters.□ Theorem 2. Let N be the total number of meters of the network. Maximum value of nodes (meters) that can collude is (N−1). The protocol is secure against such node collusion attack. Proof. There is a network of N smart meters with each meter having a random secret share. These shares are generated by BS and distributed to the smart meters during the initialization phase of the protocol. The secret shares are generated based on the concept proposed in [70]. At any meter m, the smart meter has the aggregation value shown in the following equation:   Xm=Xm−1+dm+rm (3) Now, consider N−1 meters collude together to get the value of uncompromised meter i.e. if we consider all the other meters apart from meter m are colluded to get the data value of meter m namely dm. However, as random share of an uncompromised node ( rm) is not known, it is impossible to derive data value of the same ( dm). The addition of secret shares to the actual data ensures that the adversary cannot obtain the actual aggregated data unless all the nodes collide i.e. collision attack works only when all N nodes of the network are colluded and that is hard to achieve for any adversary. Therefore, the proposed data aggregation scheme is secure against node collusion attacks.□ Theorem 3. Let N be the total number of meters of the network and k be the number of consecutive meters that can be captured by the adversary. Maximum value of k can be (N−2). If adversary captures at most k meters, then it cannot infer any information of meter i that is not captured. Proof. Using the security game shown in Fig. 9, we show how the proposed scheme maintains the security of the meter i's data. Figure 9. View largeDownload slide Security game. Figure 9. View largeDownload slide Security game. The key predistribution algorithm proposed in [24] is called initially to setup network of N smart meters. These smart meters are also given random shares during the setup phase of the protocol. The adversary may communicate with meters and receive the aggregated readings. It can capture k consecutive smart meters and get the access to the keys and random shares of the meters. In the challenge phase, adversary selects two uncompromised meters m0 and m1. These two meters participate in the aggregation process and there is at least one secure path between them (the keys are predistributed in such a manner that there remains at least one secure path between two uncompromised meters). Then, the challenger gives adversary the data mr, where r is randomly selected and r∈{0,1}. The adversary outputs the guess bit r′ and is considered success if r=r′ i.e. adversary has successfully determined the meter of the corresponding reading:   ADVA,N=2∗Pr[Awins]−1. (4) Equation (4) shows the indistinguishabiliy advantage to the adversary A in attacking network of N meters. In order to prove the theorem, we will prove that this advantage is zero. The worst case scenario would occur when k=N−2. Therefore, there are two uncompromised meters mi and mj corresponding to the meters m0 and m1 of security game shown in Fig. 9. Here, at any meter t the aggregated value is as shown in the following equation:   Xt=dt+rt+Xt−1 (5) Now, as mi and mj are not compromised the values of di,ri,dj,rj are not known to the adversary. The aggregate values of Xi and Xj are as shown in Equations (6) and (7), respectively:   Xi=di+ri+Xi−1 (6)  Xj=dj+rj+Xj−1 (7) As there are (N−2) captured meters, it is possible to derive the values of Xi−1 and Xj−1. However, as meters mi and mj are not compromised, it is impossible to derive the values of four unknowns namely di,ri,dj,rj from two Equations 6 and 7. If the oracle gives one of the data value i.e. di or dj without specifying that the data belongs to which meter, it is not possible to distinguish between two meters as random shares of the uncompromised meters are not known. Therefore, we can conclude that the data of an uncompromised meter mi remain secure as long as there is at least one more meter mj which is not compromised.□ 6.2.1. Secure data aggregation schemes: comparison of security parameters The comparison of secure data aggregation schemes with the proposed scheme is as shown in Table 5. We consider the following security parameters for the comparison: Aggr.: Hop-by-hop aggregation Priv.: Privacy Veri-I: Verifiability at Intermediate nodes Veri-E: Verifiability End-to-End Resilience: Resilience against node capture CA: Collision Avoidance Table 5. Comparison of secure data aggregation schemes. Scheme  Aggr.  Priv.  Veri-I  Veri-E  Resilience  CA  Ruj and Nayak [26]  ✓  ✓  ✗  ✗  ✗  ✗  Lu et al. [25]  ✓  ✓  ✗  ✓  ✗  ✗  Cheung et al. [28]  ✗  ✓  ✗  ✓  ✗  ✗  Fan et al. [29]  ✓  ✓  ✗  ✓  ✓  ✗  Kalogridis et al. [30]  ✗  ✓  ✗  ✗  ✗  ✗  Efthymiou and Kalogridis [31]  ✓  ✓  ✗  ✗  ✗  ✗  Fhom et al. [32]  ✗  ✓  ✗  ✗  ✗  ✗  Budka et al. [33]  ✓  ✓  ✗  ✗  ✗  ✗  Kurwase et al. [34]  ✓  ✓  ✗  ✓  ✓  ✗  Chim et al. [35]  ✓  ✓  ✓  ✓  ✗  ✗  Kalogridis and Denic [36]  ✗  ✓  ✗  ✗  ✗  ✗  Rajagopalan et al. [37]  ✗  ✓  ✗  ✗  ✗  ✗  Girao et al. [46]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [42]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [43]  ✓  ✓  ✓  ✗  ✗  ✗  Parmar and Jinwala [27]  ✓  ✓  ✓  ✓  ✓  ✗  Proposed  ✓  ✓  ✓  ✓  ✓  ✓  Scheme  Aggr.  Priv.  Veri-I  Veri-E  Resilience  CA  Ruj and Nayak [26]  ✓  ✓  ✗  ✗  ✗  ✗  Lu et al. [25]  ✓  ✓  ✗  ✓  ✗  ✗  Cheung et al. [28]  ✗  ✓  ✗  ✓  ✗  ✗  Fan et al. [29]  ✓  ✓  ✗  ✓  ✓  ✗  Kalogridis et al. [30]  ✗  ✓  ✗  ✗  ✗  ✗  Efthymiou and Kalogridis [31]  ✓  ✓  ✗  ✗  ✗  ✗  Fhom et al. [32]  ✗  ✓  ✗  ✗  ✗  ✗  Budka et al. [33]  ✓  ✓  ✗  ✗  ✗  ✗  Kurwase et al. [34]  ✓  ✓  ✗  ✓  ✓  ✗  Chim et al. [35]  ✓  ✓  ✓  ✓  ✗  ✗  Kalogridis and Denic [36]  ✗  ✓  ✗  ✗  ✗  ✗  Rajagopalan et al. [37]  ✗  ✓  ✗  ✗  ✗  ✗  Girao et al. [46]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [42]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [43]  ✓  ✓  ✓  ✗  ✗  ✗  Parmar and Jinwala [27]  ✓  ✓  ✓  ✓  ✓  ✗  Proposed  ✓  ✓  ✓  ✓  ✓  ✓  Table 5. Comparison of secure data aggregation schemes. Scheme  Aggr.  Priv.  Veri-I  Veri-E  Resilience  CA  Ruj and Nayak [26]  ✓  ✓  ✗  ✗  ✗  ✗  Lu et al. [25]  ✓  ✓  ✗  ✓  ✗  ✗  Cheung et al. [28]  ✗  ✓  ✗  ✓  ✗  ✗  Fan et al. [29]  ✓  ✓  ✗  ✓  ✓  ✗  Kalogridis et al. [30]  ✗  ✓  ✗  ✗  ✗  ✗  Efthymiou and Kalogridis [31]  ✓  ✓  ✗  ✗  ✗  ✗  Fhom et al. [32]  ✗  ✓  ✗  ✗  ✗  ✗  Budka et al. [33]  ✓  ✓  ✗  ✗  ✗  ✗  Kurwase et al. [34]  ✓  ✓  ✗  ✓  ✓  ✗  Chim et al. [35]  ✓  ✓  ✓  ✓  ✗  ✗  Kalogridis and Denic [36]  ✗  ✓  ✗  ✗  ✗  ✗  Rajagopalan et al. [37]  ✗  ✓  ✗  ✗  ✗  ✗  Girao et al. [46]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [42]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [43]  ✓  ✓  ✓  ✗  ✗  ✗  Parmar and Jinwala [27]  ✓  ✓  ✓  ✓  ✓  ✗  Proposed  ✓  ✓  ✓  ✓  ✓  ✓  Scheme  Aggr.  Priv.  Veri-I  Veri-E  Resilience  CA  Ruj and Nayak [26]  ✓  ✓  ✗  ✗  ✗  ✗  Lu et al. [25]  ✓  ✓  ✗  ✓  ✗  ✗  Cheung et al. [28]  ✗  ✓  ✗  ✓  ✗  ✗  Fan et al. [29]  ✓  ✓  ✗  ✓  ✓  ✗  Kalogridis et al. [30]  ✗  ✓  ✗  ✗  ✗  ✗  Efthymiou and Kalogridis [31]  ✓  ✓  ✗  ✗  ✗  ✗  Fhom et al. [32]  ✗  ✓  ✗  ✗  ✗  ✗  Budka et al. [33]  ✓  ✓  ✗  ✗  ✗  ✗  Kurwase et al. [34]  ✓  ✓  ✗  ✓  ✓  ✗  Chim et al. [35]  ✓  ✓  ✓  ✓  ✗  ✗  Kalogridis and Denic [36]  ✗  ✓  ✗  ✗  ✗  ✗  Rajagopalan et al. [37]  ✗  ✓  ✗  ✗  ✗  ✗  Girao et al. [46]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [42]  ✓  ✓  ✗  ✗  ✗  ✗  Castelluccia et al. [43]  ✓  ✓  ✓  ✗  ✗  ✗  Parmar and Jinwala [27]  ✓  ✓  ✓  ✓  ✓  ✗  Proposed  ✓  ✓  ✓  ✓  ✓  ✓  7. CONCLUSIONS AND FUTURE DIRECTIONS We proposed an aggregation scheme that provides following security features: Privacy preservation of data being communicated. The integrity of data at intermediate nodes as well as at the BS (Verification). Resilience against the node capture attacks. Avoidance against collusion attacks. The proposed aggregation scheme achieves all the mentioned security features with lesser key storage requirement and number of messages being communicated. The scheme is resilient against active and passive adversaries. The mathematical analysis and associated proofs showed that the proposed scheme is resilient against node capture. Moreover, we proved the security of the proposed aggregation scheme with the help of security game of meter indistinguishability. The proposed scheme avoids collusion attacks as it is based on the idea of using secret shares. Variant of the proposed scheme uses homomorphic encryption that is computationally expensive. However, it can be applied in the scenario where computation is not much of a problem. 7.1. Future directions We applied the proposed scheme on the networks that form square grid i.e. the objects being monitored form a square grid. We also discussed the real-world applications that form such topology. How the proposed scheme will work in the more complex scenarios viz. the applications that form routing topologies such as random, clustered or tree is considered as a future work. We cannot apply the proposed scheme directly on such complex networks as the routing and communication pattern is different. Moreover, the placement of nodes is also different than grid-based networks. However, we can extend the proposed scheme to work for the subsets of such complex routing topologies. E.g. consider the clustered or tree-based topologies, there are subsets of nodes or cluster heads that form linear or grid-based networks. REFERENCES 1 Yick, J., Mukherjee, B. and Ghosal, D. ( 2008) Wireless sensor network survey. Comput. Netw. , 52, 2292– 2330. Google Scholar CrossRef Search ADS   2 Akyildiz, I.F., Su, W., Sankarasubramaniam, Y. and Cayirci, E. ( 2002) Wireless sensor networks: a survey. Comput. Netw. , 38, 393– 422. Google Scholar CrossRef Search ADS   3 Edu, H. ( 2004) Tmotesky: Low Power Wireless Sensor Module. http://www.eecs.harvard.edu/konrad/projects/shimmer/references/tmote-sky-datasheet.pdf (accessed November 25, 2004). 4 Zolertia ( 2013) Z1 Features: Quick hardware tour. http://zolertia.sourceforge.net/wiki/index.php/Z1 (accessed July 26, 2013). 5 Digikey ( 2007) Choosing an MCU for Smart Energy Meters. http://www.digikey.com/en/articles/techzone/2012/jul/choosing-an-mcu-for-smart-energy-meters (accessed July 11, 2012). 6 Krishnamachari, L., Estrin, D. and Wicker, S. ( 2002) The Impact of Data Aggregation in Wireless Sensor Networks. 22nd Int. Conf. Distributed Computing Systems Workshops., Vienna, Austria, July, pp. 575–578. IEEE. 7 Ozdemir, S. and Xiao, Y. ( 2009) Secure data aggregation in wireless sensor networks: a comprehensive overview. Comput. Netw. , 53, 2022– 2037. Google Scholar CrossRef Search ADS   8 Perrig, A., Stankovic, J. and Wagner, D. ( 2004) Security in wireless sensor networks. Commun. ACM , 47, 53– 57. Google Scholar CrossRef Search ADS   9 Du, W., Deng, J., Han, Y.S., Varshney, P.K., Katz, J. and Khalili, A. ( 2005) A pairwise key predistribution scheme for wireless sensor networks. ACM Trans. Inf. Syst. Secur. , 8, 228– 258. Google Scholar CrossRef Search ADS   10 Yagan, O. and Makowski, A.M. ( 2013) Modeling the pairwise key predistribution scheme in the presence of unreliable links. IEEE Trans. Inf. Theory , 59, 1740– 1760. Google Scholar CrossRef Search ADS   11 Yavuz, F., Zhao, J., Yagan, O. and Gligor, V. ( 2014) On Secure and Reliable Communications in Wireless Sensor Networks: Towards k-Connectivity under a Random Pairwise Key Predistribution Scheme. Int. Symp. Information Theory (ISIT), Honolulu, HI, USA, August, pp. 2381–2385. IEEE. 12 Kalogridis, G., Fan, Z. and Basutkar, S. ( 2011) Affordable Privacy for Home Smart Meters. 9th Int. Symp. Parallel and Distributed Processing with Applications Workshops (ISPAW), Busan, South Korea, July, pp. 77–84. IEEE. 13 De Meulenaer, G., Gosset, F., Standaert, F.-X. and Pereira, O. ( 2008) On the Energy Cost of Communication and Cryptography in Wireless Sensor Networks. Int. Conf. Wireless and Mobile Computing, Networking and Communications, Avignon, France, October, pp. 580–585. IEEE. 14 Shah, K. and Jinwala, D.C. ( 2016) A Secure Expansive Aggregation in Wireless Sensor Networks for Linear Infrasturcture. Region 10 Symposium (TENSYMP), Bali, Indonesia, July, pp. 207–212. IEEE. 15 He, W., Liu, X., Nguyen, H., Nahrstedt, K. and Abdelzaher, T. ( 2007) Pda: Privacy-Preserving Data Aggregation in Wireless Sensor Networks. 26th Int. Conf. Computer Communications, Barcelona, Spain, May, pp. 2045–2053. IEEE. 16 Chan, H. and Perrig, A. ( 2003) Security and privacy in sensor networks. Computer , 36, 103– 105. Google Scholar CrossRef Search ADS   17 Domingo-Ferrer, J. ( 2002) A Provably Secure Additive and Multiplicative Privacy Homomorphism. Int. Conf. Information Security, Berlin, Heidelberg, September, pp. 471–483. Springer. 18 i Ferrer, J.D. ( 1996) A new privacy homomorphism and applications. Inf. Process. Lett. , 60, 277– 282. Google Scholar CrossRef Search ADS   19 Fontaine, C. and Galand, F. ( 2007) A survey of homomorphic encryption for nonspecialists. EURASIP J. Inf. Secur. , 2007, 1– 10. Google Scholar CrossRef Search ADS   20 Chan, A.C. and Castelluccia, C. ( 2008) On the (im) Possibility of Aggregate Message Authentication Codes. Int. Symp. Information Theory, Toronto, ON, Canada, August, pp. 235–239. IEEE. 21 Agrawal, S. and Boneh, D. ( 2009) Homomorphic macs: Mac-Based Integrity for Network Coding. Int. Conf. Applied Cryptography and Network Security, France, June, pp. 292–305. Springer. 22 Westhoff, D. and Ugus, O. ( 2013) Malleability Resilient (Premium) Concealed Data aggregation. 14th Int. Symp. Wkshp. World of Wireless, Mobile and Multimedia Networks (WoWMoM), Madrid, Spain, August, pp. 1–6. IEEE. 23 Martin, K.M. and Paterson, M.B. ( 2009) Ultra-Lightweight Key Predistribution in Wireless Sensor Networks for Monitoring Linear Infrastructure. IFIP Int. Wkshp. Information Security Theory and Practices, Belgium, September, pp. 143–152. Springer. 24 Shah, K.A. and Jinwala, D.C. ( 2017) Novel approach for pre-distributing keys in wsns for linear infrastructure. Wireless Pers. Commun. , 95, 3905– 3921. Google Scholar CrossRef Search ADS   25 Lu, R., Liang, X., Li, X., Lin, X. and Shen, X.S. ( 2012) Eppa: an efficient and privacy-preserving aggregation scheme for secure smart grid communications. IEEE Trans. Parall. Distr. Syst. , 23, 1621– 1631. Google Scholar CrossRef Search ADS   26 Ruj, S. and Nayak, A. ( 2013) A decentralized security framework for data aggregation and access control in smart grids. IEEE Trans. Smart Grid , 4, 196– 205. Google Scholar CrossRef Search ADS   27 Parmar, K. and Jinwala, D.C. ( 2016) Malleability resilient concealed data aggregation in wireless sensor networks. Wireless Personal Commun. , 87, 971– 993. Google Scholar CrossRef Search ADS   28 Cheung, J.C., Chim, T.W., Yiu, S.-M., Li, V.O. and Hui, L.C. ( 2011) Credential-Based Privacy-preserving Power Request Scheme for Smart Grid Network. Global Telecommun. Conf. (GLOBECOM), Kathmandu, Nepal, January, pp. 1–5. IEEE. 29 Fan, C.-I., Huang, S.-Y. and Lai, Y.-L. ( 2014) Privacy-enhanced data aggregation scheme against internal attackers in smart grid. IEEE Trans. Ind. Informat. , 10, 666– 675. Google Scholar CrossRef Search ADS   30 Kalogridis, G., Efthymiou, C., Denic, S.Z., Lewis, T. and Cepeda, R. ( 2010) Privacy for Smart Meters: Towards Undetectable Appliance Load Signatures. 1st Int. Conf. Smart Grid Communications (SmartGridComm), Gaithersburg, MD, USA, November, pp. 232–237. IEEE. 31 Efthymiou, C. and Kalogridis, G. ( 2010) Smart Grid Privacy via Anonymization of Smart Metering Data. 1st Int. Conf. Smart Grid Communications (SmartGridComm), Gaithersburg, MD, USA, November, pp. 238–243. IEEE. 32 Fhom, H.S., Kuntze, N., Rudolph, C., Cupelli, M., Liu, J. and Monti, A. ( 2010) A User-centric Privacy Manager for Future Energy Systems. Int. Conf. Power System Technology (POWERCON), Hangzhou, China, December, pp. 1–7. IEEE. 33 Budka, K. et al.  . ( 2010) Geri-Bell Labs Smart Grid Research Focus: Economic Modeling, Networking, and Security & Privacy. 1st Int. Conf. Smart Grid Communications (SmartGridComm), Gaithersburg, MD, USA, November, pp. 208–213. IEEE. 34 Kursawe, K., Danezis, G. and Kohlweiss, M. ( 2011) Privacy-Friendly Aggregation for the Smart-Grid. Privacy Enhancing Technologies, Waterloo, ON, Canada, July, pp. 175–191. Springer. 35 Chim, T.W., Yiu, S.-M., Hui, L.C. and Li, V.O. ( 2011) Pass: Privacy-Preserving Authentication Scheme for Smart Grid Network. Int. Conf. Smart Grid Communications (SmartGridComm), Brussels, Belgium, December, pp. 196–201. IEEE. 36 Kalogridis, G. and Denic, S.Z. ( 2011) Data Mining and Privacy of Personal Behaviour Types in Smart Grid. 11th Int. Conf. Data Mining Workshops (ICDMW), Vancouver, BC, Canada, January, pp. 636–642. IEEE. 37 Rajagopalan, S.R., Sankar, L., Mohajer, S. and Poor, H.V. ( 2011) Smart Meter Privacy: A Utility-privacy Framework. Int. Conf. Smart Grid Communications (SmartGridComm), Brussels, Belgium, December, pp. 190–195. IEEE. 38 Blackburn, S.R., Etzion, T., Martin, K.M. and Paterson, M.B. ( 2008) Efficient Key Predistribution for Grid-based Wireless Sensor Networks. Int. Conf. Information Theoretic Security, Canada, August, pp. 54–69. Springer. 39 Mohaisen, A. and Nyang, D.-H. ( 2006) Hierarchical Grid-based Pairwise Key Predistribution Scheme for Wireless Sensor Networks. Eur. Wkshp. Wireless Sensor Networks, Berlin, Heidelberg, February, pp. 83–98. Springer. 40 Mohaisen, A., Maeng, Y. and Nyang, D. ( 2007) On Grid-based Key Pre-distribution: Toward a Better Connectivity in Wireless Sensor Network. Pacific-Asia Conf. Knowledge Discovery and Data Mining, China, May, pp. 527–537. Springer. 41 Henry, K.J. and Stinson, D.R. ( 2014) Resilient aggregation in simple linear sensor networks. IACR Cryptol. ePrint Arch. , 2014, 288– 310. 42 Castelluccia, C., Mykletun, E. and Tsudik, G. ( 2005) Efficient Aggregation of Encrypted Data in Wireless Sensor Networks. 2nd Annu. Int. Conf. Mobile and Ubiquitous Systems: Networking and Services, San Diego, CA, USA, November, pp. 109–117. IEEE. 43 Castelluccia, C., Chan, A.C., Mykletun, E. and Tsudik, G. ( 2009) Efficient and provably secure aggregation of encrypted data in wireless sensor networks. ACM Trans. Sen. Netw. , 5, 20. Google Scholar CrossRef Search ADS   44 Malan, D.J., Welsh, M. and Smith, M.D. ( 2004) A Public-key Infrastructure for Key Distribution in Tinyos Based on Elliptic Curve Cryptography. 2004 1st Annu. IEEE Commun. Soc. Conf. Sensor and Ad Hoc Communications and Networks, 2004. IEEE SECON 2004, Santa Clara, CA, USA, January, pp. 71–80. IEEE. 45 Rivest, R.L., Adleman, L. and Dertouzos, M.L. ( 1978) On data banks and privacy homomorphisms. Foundat. Secur. Computat. , 4, 169– 180. 46 Girao, J., Westhoff, D. and Schneider, M. ( 2005) Cda: Concealed Data Aggregation for Reverse Multicast Traffic in Wireless Sensor Networks. Int. Conf. Communications (ICC), Seoul, South Korea, August, pp. 3044–3049. IEEE. 47 Westhoff, D., Girao, J. and Acharya, M. ( 2006) Concealed data aggregation for reverse multicast traffic in sensor networks: encryption, key distribution, and routing adaptation. IEEE Trans. Mob. Comput. , 5, 1417– 1431. Google Scholar CrossRef Search ADS   48 Peter, S., Piotrowski, K. and Langendoerfer, P. ( 2007) On Concealed Data Aggregation for WSNs. IEEE Consum. Commun. Network. Conf., Las Vegas, NV, USA, May, pp. 192–196. IEEE. 49 Mykletun, E., Girao, J. and Westhoff, D. ( 2006) Public Key Based Cryptoschemes for Data Concealment in Wireless Sensor Networks. 2006 IEEE Int. Conf. Commun., Istanbul, Turkey, December, pp. 2288–2295. IEEE. 50 Parmar, K. and Jinwala, D.C. ( 2014) Malleability Resilient Concealed Data Aggregation. Meeting of the European Network of Universities and Companies in Information and Communication Engineering, Rennes, France, September, pp. 160–172. Springer. 51 Di Pietro, R., Michiardi, P. and Molva, R. ( 2009) Confidentiality and integrity for data aggregation in wsn using peer monitoring. Secur. Commun. Netw. , 2, 181– 194. Google Scholar CrossRef Search ADS   52 Bagaa, M., Challal, Y., Ouadjaout, A., Lasla, N. and Badache, N. ( 2012) Efficient data aggregation with in-network integrity control for wsn. J. Parallel Distrib. Comput. , 72, 1157– 1170. Google Scholar CrossRef Search ADS   53 Chen, C.-M., Lin, Y.-H., Chen, Y.-H. and Sun, H.-M. ( 2013) Sashimi: secure aggregation via successively hierarchical inspecting of message integrity on wsn. J. Inf. Hiding Multimedia Signal Process. , 4, 57– 72. 54 Perrig, A., Szewczyk, R., Tygar, J.D., Wen, V. and Culler, D.E. ( 2002) Spins: security protocols for sensor networks. Wireless Netw. , 8, 521– 534. Google Scholar CrossRef Search ADS   55 Karlof, C., Sastry, N. and Wagner, D. ( 2004) Tinysec: A Link Layer Security Architecture for Wireless Sensor Networks. Proc. 2nd Int. Conf. Embedded networked sensor systems, New York, NY, USA, November, pp. 162–175. ACM. 56 Salehi, S.A., Razzaque, M., Naraei, P. and Farrokhtala, A. ( 2013) Security in Wireless Sensor Networks: Issues and Challanges. Int. Conf. Space Science and Communication (IconSpace), Melaka, Malaysia, September, pp. 356–360. IEEE. 57 Pathan, A.-S.K., Lee, H.-W. and Hong, C.S. ( 2006) Security in Wireless Sensor Networks: Issues and Challenges. 8th Int. Conf. Advanced Communication Technology, Phoenix Park, South Korea, May, pp. 6–pp. IEEE. 58 Sang, Y., Shen, H., Inoguchi, Y., Tan, Y. and Xiong, N. ( 2006) Secure Data Aggregation in Wireless Sensor Networks: A Survey. 7th Int. Conf. Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), Taipei, Taiwan, December, pp. 315–320. IEEE. 59 Chen, C.-M., Lin, Y.-H., Lin, Y.-C. and Sun, H.-M. ( 2012) Rcda: recoverable concealed data aggregation for data integrity in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. , 23, 727– 734. Google Scholar CrossRef Search ADS   60 Ozdemir, S. ( 2007) Concealed Data Aggregation in Heterogeneous Sensor Networks Using Privacy Homomorphism. Int. Conf. Pervasive Services, Istanbul, Turkey, August, pp. 165–168. IEEE. 61 Liu, C.-X., Liu, Y., Zhang, Z.-J. and Cheng, Z.-Y. ( 2013) High energy-efficient and privacy-preserving secure data aggregation for wireless sensor networks. Int. J. Commun. Syst. , 26, 380– 394. Google Scholar CrossRef Search ADS   62 Laughman, C., Lee, K., Cox, R., Shaw, S., Leeb, S., Norford, L. and Armstrong, P. ( 2003) Power signature analysis. IEEE Power Energ. Mag. , 1, 56– 63. Google Scholar CrossRef Search ADS   63 Cho, H.S., Yamazaki, T. and Hahn, M. ( 2010) Aero: extraction of user’s activities from electric power consumption data. IEEE Trans. Consum. Electron. , 56, 2011– 2018. Google Scholar CrossRef Search ADS   64 Lisovich, M. and Wicker, S. ( 2008) Privacy Concerns in Upcoming Residential and Commercial Demand-Response Systems. IEEE Proc. Power Systems, Clemson, USA, March, pp. 1–10. IEEE. 65 McDaniel, P. and McLaughlin, S. ( 2009) Security and privacy challenges in the smart grid. IEEE Security Privacy , 7, 75– 77. Google Scholar CrossRef Search ADS   66 Hart, G.W. ( 1992) Non-intrusive appliance load monitoring. Proc. IEEE , 80, 1870– 1891. Google Scholar CrossRef Search ADS   67 Lam, H., Fung, G. and Lee, W. ( 2007) A novel method to construct taxonomy electrical appliances based on load signaturesof. IEEE Trans. Consum. Electron. , 53, 653– 660. Google Scholar CrossRef Search ADS   68 Erol-Kantarci, M. and Mouftah, H.T. ( 2015) Energy-efficient information and communication infrastructures in the smart grid: a survey on interactions and open issues. IEEE Commun. Surveys Tuts. , 17, 179– 197. Google Scholar CrossRef Search ADS   69 Mahmood, A., Javaid, N. and Razzaq, S. ( 2015) A review of wireless communications for smart grid. Renewable Sustainable Energy Rev. , 41, 248– 260. Google Scholar CrossRef Search ADS   70 Shamir, A. ( 1979) How to share a secret. Commun. ACM , 22, 612– 613. Google Scholar CrossRef Search ADS   71 Jaggi, S., Langberg, M., Katti, S., Ho, T., Katabi, D. and Médard, M. ( 2007) Resilient Network Coding in the Presence of Byzantine Adversaries. INFOCOM 2007 26th Int. Conf. Computer Communications, Barcelona, Spain, May, pp. 616–624. IEEE. 72 Hu, L. and Evans, D. ( 2003) Secure Aggregation for Wireless Networks. Symp. Applications and the Internet Workshops, Orlando, FL, USA, July, pp. 384–391. IEEE. 73 Yang, Y., Wang, X., Zhu, S. and Cao, G. ( 2008) Sdap: a secure hop-by-hop data aggregation protocol for sensor networks. ACM Trans. Inf. Syst. Security , 11, 18. Google Scholar CrossRef Search ADS   74 Ding, M., Chen, D., Xing, K. and Cheng, X. ( 2005) Localized Fault–tolerant Event Boundary Detection in Sensor Networks. INFOCOM—24th Annu. Joint Conf. IEEE Computer and Communications Societies, Miami, FL, USA, August, pp. 902–913. IEEE. 75 Raymond, D.R. and Midkiff, S.F. ( 2008) Denial-of-service in wireless sensor networks: attacks and defenses. IEEE Pervasive Comput. , 7, 74– 81. Google Scholar CrossRef Search ADS   76 Rivest, R.L. ( 1994) The rc5 Encryption Algorithm. Int. Wkshp. Fast Software Encryption, Leuven, Belgium, December, pp. 86–96. Springer. 77 Yun, M. and Yuxin, B. ( 2010) Research on the Architecture and Key Technology of Internet of Things (iot) Applied on Smart Grid. Int. Conf. Advances in Energy Engineering (ICAEE), Beijing, China, August, pp. 69–72. IEEE. 78 Karnouskos, S. ( 2010) The Cooperative Internet of Things Enabled Smart Grid. 14th IEEE Int. Symp. Consumer Electronics (ISCE2010), Germany, June, pp. 07–10. IEEE. © The British Computer Society 2018. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) TI - Privacy Preserving, Verifiable and Resilient Data Aggregation in Grid-Based Networks JF - The Computer Journal DO - 10.1093/comjnl/bxy013 DA - 2018-04-01 UR - https://www.deepdyve.com/lp/oxford-university-press/privacy-preserving-verifiable-and-resilient-data-aggregation-in-grid-DP2eyRuQPD SP - 614 EP - 628 VL - 61 IS - 4 DP - DeepDyve ER -