TY - JOUR AU - Jiang,, Kaiwei AB - Abstract In this research, we consider the resource allocation of spectrum sensing and transmit power for a cognitive sensor node with energy harvesting capability, operating in time-slotted fashion with causal knowledge of the channel state and the energy harvesting state. Taking into account the status of primary channel occupation and the sensing imperfection, we formulate this resource allocation problem as an infinite-horizon discrete-time Markov decision process (MDP) in which the cognitive sensor node aims at maximizing the long-term expected throughput. An optimal sensing-transmission (OST) policy which specifies the time duration allocated for spectrum sensing as well as the power level to be used upon the transmission is proposed. A structural property pertaining to the OST policy is revealed, that is the optimal long-term expected throughput is non-decreasing with the level of the battery available energy. Moreover, we study a special case with sufficiently high signal-to-noise (SNR) power ratio of the primary signal. We demonstrate that the optimal transmit power has a monotonic structure with respect to the battery energy states. Based on this monotonic structure, an efficient sensing-transmission algorithm with low-complexity is developed. The simulation results are presented to confirm the theoretical analysis and the predominance of our proposed policies. 1. INTRODUCTION In recent years, with the rapid growth of wireless applications, the bandwidth demand for limited spectrum resource is expected to keep growing. However, under the current fixed spectrum allocation approach which has licensed most of the spectrum for specific applications, a plenty of licensed spectrum is severely underutilized over vast temporal and geographic expanses [1]. In order to address the wireless spectrum scarcity issue under current fixed spectrum allocation policy, Cognitive radio (CR) [2–4], which enables the opportunistic spectrum sharing between licensed (primary) users (PUs) and unlicensed (secondary) users (SUs), is considered as one of the key technologies to enhance the spectrum efficiency and has received an increasing attention. Wireless sensor networks have become a prevalent solution to a wide range of applications including environmental and traffic monitoring, patient monitoring and smart homes [5]. Typically, most WSNs operate on unlicensed fixed spectrum for data transmission. Due to the coexistence of various emerging networking standards, particularly IEEE 802.11, Bluetooth (IEEE 802.15.1), and WSN itself, the unlicensed spectrum has become saturated [6]. It is therefore imperative to employ CR in WSNs to exploit the dynamic spectrum access techniques so as to alleviate the spectral congestion problem. Such networks are referred to as Cognitive Radio Sensor Networks (CRSNs) in which sensors are secondary users (SUs) and primary network subscribers are primary users (PUs) [7]. As indicated in [8, 9], CRSNs can help to reduce congestion and excessive packet loss and thereby make transmissions more reliable [10]. Besides, CR-enabled sensor networks can be particularly used in industrial sensor networks, which have recently been proposed for smart grid applications [11]. In addition to bringing the above potential benefits by incorporating the CR, many new research issues have also been introduced in CRSNs. In many practical situations, a cognitive sensor consists of a battery-powered device, and performs essential operations such as spectrum sensing and access, consuming a plenty of energy. Although energy saving and battery replacing or recharging can extend the functioning time of sensor nodes to a certain extent, such techniques usually afford high cost and involve inconvenient or even impossible operations in some cases [12]. One emerging technology targeting energy-constrained SUs is energy harvesting (EH) [13]. Compared with traditional battery-powered devices, EH-powered devices are able to collect unlimited energy from either radio frequency signals or ambient environmental energy sources, which enable them to function for a potentially infinite lifetime without battery replacement [14]. In addition, powering sensor nodes by harvested energy could also reduce the carbon emission caused by energy consumption and makes wireless networks environmental friendly. Moreover, as hardware advances lead to more efficient energy conversion and storage, EH is considered to play an increasingly important role for power-limited wireless networks [15]. Therefore, incorporating EH in CRSNs is expected to be a promising technology that can greatly enhance both spectrum utilization and energy efficiency. In this research, an EH CR sensor network consisting of a pair of cognitive sensor nodes denoted as SUs is considered. The SU equipped with a finite-capacity battery is solely powered by renewable energy sources. We intend to jointly optimize the duration of spectrum sensing as well as the allocation of transmitting power, with the goal of maximizing the long-term expected throughput of the SU under the constraint of energy causality. Energy causality constraint stipulates that at each time instant, the energy expenditure cannot surpass the total amount of energy harvested so far. In order to obtain the optimal performance of SU, multiple non-trivial tradeoffs need to be jointly considered. Concretely, owing to the sensing imperfection, with a longer duration spending on spectrum sensing, the sensor node obtains a more accurate outcome concerning the status of the licensed spectrum, therefore conduct a more reliable decision and possibly improve the performance. However, this increases both the time and the energy consumption, leading to less residual time as well as energy for the subsequent data transmission. Besides, due to the dynamics of the energy harvested from the environment, the SU has to adaptively choose the transmit power to cope with the fluctuations in energy supply as well as channel fading. When the channel condition is expected to be good, SU should raise the transmit power level in order to obtain a higher throughout. However, an overly aggressive allocation of transmitting power possibly leads to premature energy depletion in the battery before the next recharge cycle. On the contrary, when the channel condition is expected to be bad, SU should decrease the transmit power level so as to save more energy for future utilization. But an overly conservative transmit power allocation may cause energy overflow, failing to fully exploit the harvested energy. The primary goal of the EH-enabled SU is to maximize the long-term throughput by adapting the sensing duration and transmit power to the SU’s current knowledge of battery available energy, channel status together with the state of harvested energy. The main contributions of this work are as follows: Allowing for the sensing imperfection, channel condition variation as well as the fluctuation of harvested energy, we formulate the resource allocation problem as an infinite-horizon discrete-time Markov decision process (MDP), in which the SU focuses on maximizing the long-term expected throughput. Both the sensing time and the transmit power level are taken into account to adapt to the battery states, the channel conditions as well as the dynamic energy arrivals so as to achieve the maximum throughput. For the general case, we propose an optimal sensing-transmission (OST) policy through applying the value iteration in the MDP. In addition, it is proved that the long-term expected throughput obtained by OST is non-decreasing with the level of the battery available energy, which has been validated by numerical results. For the special case with sufficiently high signal-to-noise (SNR) power ratio of PU signal, we prove that the optimal transmit power has a monotonic structure with respect to the available energy in the battery. According to this monotonic structure, we introduce an efficient sensing-transmission (EST) policy, which has lower computational complexity than the OST policy. It is demonstrated that the EST policy achieves the same performance with the proposed OST policy in this special case. We study the performance of our proposed policies via simulation and demonstrate that a significant gain is achieved by OST policy over several alternative schemes, and the EST policy achieves the same performance with the OST policy in case that the primary signal is sufficiently high. The outline of this paper is as follows. An overview of state-of-the-art EH and CR techniques are presented in Section 2. Section 3 states the network model, and we formulate the resource allocation problem in Section 4. The devised policies are introduced in Section 5. Simulation results are provided in Section 6. Section 7 concludes the paper. 2. RELATED WORK First, there have been many recent works that have studied different problems in wireless systems with energy harvesting [16–23, 12]. References [12, 16–19] investigate the wireless communications with a single link. Zhao et al. [12] considered the joint wireless power and information transfer in a point-to-point communication system and try to achieve the balance between energy harvesting and information transfer so as to maximize the throughput. Ozel et al. [16] aimed to achieve the maximum throughput by a deadline and minimize the transmission completion time by optimizing the time sequence of transmit powers. An offline policy and an online policy are proposed, which can be solved by a directional water-filling algorithm and the dynamic programming respectively. With causal side information and full side information available, Ho et al. [17] studied the problem of maximizing the throughput via energy allocation over a finite horizon of time slots. In an EH wireless sensor network, in order to achieve the maximum expected total transmitted data within a finite horizon time slots, Mao et al. [18] investigated the problem of energy management between sensing and transmission. The problem considered in [18] is further extended to an infinite-horizon case by Mao et al. [19], which is formulated as a Markov decision process. Within a given number of time slots, Ahmed et al. [20] studied the energy consumption minimization of the constant energy source for completing the transmission of a certain number of data. Mao et al. [21] studied the optimal transmission scheduling problem in composite energy supply networks with the save-then-transmit protocol. By employing the harvest-use protocol, Siddiqui et al. [22] studied the optimization of power scheduling so as to improve the energy-efficiency. Through adapting the transmission parameters to the channel variation as well as the battery recharge, Ku et al. [23] presented a data-driven solution of finding optimal policies for a sensing node with solar energy supply. Second, the integration of CR and EH to simultaneously improve spectral and energy efficiency has attracted tremendous interests. Yin et al. [24] investigated an EH-CR system where a time slot is divided into three non-overlapping fractions for three unique operations. Two types of fusion rules are considered and the harvesting-sensing-throughput tradeoff is explored. Allowing for stochastic harvested energy, channel qualities and belief state of the primary network, Pradha et al. [25] introduced a channel selection criterion applicable for a single-user multi-channel setting. Based on the battery recharge and the belief states of the primary occupancy, Sultan [26] investigated the energy management policy to achieve throughput maximization. Gao et al. [27] mainly focused on the suboptimal energy allocation approach based on the dynamic spectrum state, harvested energy and the channel fading level. Park et al. [28, 29] investigated the optimal sensing policy with respect to an EH CR network so as to maximize the expected total throughput under energy causality and collision constraints, following which, the upper bound of the achievable throughput is derived as a function of the energy arrival rate, the statistical behavior of the primary channel status and the detection threshold by Park et al. [30]. In order to achieve both energy efficiency and spectrum efficiency, joint optimization problem of energy harvesting and spectrum sensing is studied by Han [31] under the energy causality constraint, collision constraint as well as the probability of sensing channel. Compared with above works, in this paper, we attempt to maximize the long-term expected throughput by adapting the channel sensing and the transmit power to the changes of battery available energy, the channel fading and the battery recharge process. 3. NETWORK MODEL 3.1. Primary network model Assume that the primary network employs the synchronous slotted communication protocol with time slot duration T ⁠. The PU owns the usage right of a spectrum, and the status of the spectrum in time slot t is indicated as θt∈{0(occupied),1(idle)} ⁠. We model the primary traffic as a two state time-homogeneous random process where the channel randomly shifts its state between occupied and idle, as assumed in [27]. We then define po≜Pr(θt=0) as the probability that the channel is occupied by the PU, and pi≜Pr(θt=1)=1−po as the probability that the channel is idle. We assume that the secondary user is aware of the pi and po through long-term spectrum usage measurements [28, 30]. 3.2. Cognitive radio sensor network model 3.2.1. Opportunistic spectrum access and energy model Consider a secondary communication link comprised of two cognitive sensor nodes, where a source node intends to convey data packets to its sink node. These cognitive sensor nodes are also referred to as SUs throughout this paper. Since the primary user has priority in utilizing the spectrum, in order to avoid the collision with the primary transmission, the source SU periodically executes spectrum sensing and transmits data later in case that there is no PU signal. Correspondingly, the overall transmission process is composed of two phases, namely the channel acquisition phase with duration αtT and the transmission phase with duration (1−αt)T ⁠, as shown in Fig. 1. We denote the αt as the spectrum sensing overhead regarding time slot t ⁠, which can be modified to enhance the system performance. As to the channel acquisition phase, the SU senses the status of the spectrum through energy detection technique. Since the complexity is roughly linear in sensing duration [26], the energy consumption es for spectrum sensing is proportional with a constant of proportionality ps ⁠, i.e. es(αt)=αtTps ⁠, where ps can be interpreted as the sensing power. If the primary channel is detected to be occupied, the SU turns off its transceiver and remains silent until the next slot without transmitting data [23, 30]. Otherwise, the SU implements the data transmission, consuming ed(αt,pt)=(1−αt)Tpt in the data transmission phase, where pt is the transmit power. Figure 1. View largeDownload slide Illustration of the transmission process. If the sensing outcome is idle, the SU will carry out data transmission. Otherwise, the SU will remains silent until the next time slot. Figure 1. View largeDownload slide Illustration of the transmission process. If the sensing outcome is idle, the SU will carry out data transmission. Otherwise, the SU will remains silent until the next time slot. We consider that the SU does not have a fixed power supply and is solely powered by energy scavenged from the ambient environment. The available energy by EH arrives randomly in each slot, and is stored in a rechargeable battery of finite capacity. We assume that the arrival energy is a time correlated process following a first-order discrete-time Markovian model [19, 32]. We assume the SU employs the harvest-store-use model [33]. The harvest-store-use model means that there is a storage device to gather the harvested energy which can be used only after it is stored in the buffer at the next time slot. 3.2.2. Sensing errors and throughput During the channel acquisition phase, the SU detects the presence of PU’s signal by performing a hypothesis test with H0:θt=0(occupied) and H1:θt=1(idle) ⁠. H0 indicates that it is assumed the presence of the PU, and H1 means that it is assumed the absence of PU. The accuracy of spectrum sensing is determined by the probability of false alarm Pf and the probability of detection Pd that defined as Pf=Pr{θˆt=0∣θt=1}, (1) Pd=Pr{θˆt=0∣θt=0}, (2) with θˆt=0 or θˆt=1 indicates the channel status is observed to be occupied or idle, respectively. Considering the complex-valued primary signal and circularly symmetric complex Gaussian (CSCG) noise case, the false alarm probability is given by [34] Pf(αt)=Q(2β+1Q−1(P¯d)+αtTfsβ), (3) where Pd is a pre-defined target detection probability which ensures sufficient protection to the PU. β means the received signal-to-noise ratio (SNR) of the primary signal at SU, which is also referred as primary SNR in this paper. fs is the sampling frequency. The function Q(·) is Q(x)=(1/2π)∫x∞exp(−t2/2)dt ⁠. After detecting the primary channel, the SU performs channel estimation to obtain the channel condition. A few number of pilot signals will be sent from the SU to the receiver to probe the channel quality information [35]. To ensure sufficient protection to the primary transmission, spectrum sensing time is usually mulch longer than the channel estimation time, hence, in our analysis, we omit the channel estimation time, similar to [36, 25]. If the primary channel is truly idle and the hypothesis test result is H1 ⁠, the achievable throughput for time slot t is u(αt,pt,γt)=(1−αt)Tlog(1+ptγtN0), (4) where N0 is the noise power of the destination, γt is the channel power. Otherwise, either because the SU is abstains from transmission (i.e. θˆt=0 ⁠) or due to the collision with primary transmission, the throughput for time slot t is zero. 4. PROBLEM FORMULATION We present our optimal resource allocation problem as an MDP problem by specifying the decision epochs, system state, action set, the state transition and the reward functions. The decision epochs are t∈T={0,1,2,…} ⁠. The system state can be characterized by a three-tuple s=(b,g,h)∈B×G×H ⁠, where B={0,1,2,…,NB−1} is the battery states set, G={0,1,2,…NG−1} is the channel states set, and H={0,1,2,…NH−1} is the arrival energy states set. Similar to [23], we quantize the energy in units of eu ⁠, which indicates one unit of the energy quantum, therefore, as for the state b∈B ⁠, the total available energy in the battery is beu ⁠. Respecting the arrival energy, the actual value of arrival energy takes values from a candidate set Q={Q0,Q1,...,QNH−1} ⁠, where Qk∈Z+ [29], and Qk0}, (24) where 1x denotes the indicator function which takes value one if x is true, otherwise 0 ⁠. And the action set of the transmit power is expressed as A^p(s,aα)={0,1,2,…,b−1b>0}. (25) Correspondingly, the feasible action set is given by A^s={(aα,ap)∣aα=1b>0,ap∈A^p(s,aα)}. (26) From (26), we can observe that when the battery is completely depleted at the 0th state, then both aα and ap equal to 0 ⁠, which indicates that the SU is unable to take any action. When the battery state is greater than or equal to 1, then the action set can be represented as A^s={(aα,ap)∣aα=1,ap={0,1,…,b−1}} ⁠. For a system state s=(b,g,h) ⁠, let V¯(b,g,h) be the optimal long-term expected reward, which satisfies the Bellman’s equation: V¯(b,g,h)=max0≤ap≤b−1b>0{r(b,g,h,1b>0,ap)+λ[P(O0)V˜(b−1b>0,g,h)+P(O1)V˜(b−1b>0−ap,g,h)]} (27) where V˜(b˜,g,h)=Eg′,h′[V¯(min{b˜+Qh′,NB−1},g′,h′)∣g,h]=∑g′∈G∑h′∈HP(g′∣g)P(h′∣h)V¯(min{b˜+Qh′,NB−1},g′,h′). (28) Concerning this special case, we can also derive the optimal policy according to Algorithm 1, only by computing V¯i+1(b,g,h) instead of Vi+1(b,g,h) ⁠, where V¯i+1(b,g,h)=max0≤ap≤b−1b>0{r(b,g,h,1b>0,ap)}+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]} (29) and V˜i(b˜,g,h)=Eg′,h′[V¯i(min{b˜+Qh′,NB−1},g′,h′)∣g,h], (30) i is the iteration index. However, we can prove some properties related to functions V¯i(b,g,h) and Vi˜(b,g,h) ⁠, and obtain an efficient sensing-transmission (EST) policy with reduced computation complexity. Lemma 5.2 In sufficiently high primary SNR, for arbitrary g∈Gand h∈H ⁠, V¯i(b,g,h)and Vi˜(b˜,g,h)are concave in band b˜ ⁠, respectively, where b≥1and b˜≥0 ⁠. Proof The proof is in Appendix 1.1.□ Furthermore, we obtain the following theorem. Theorem 5.4 In sufficiently high primary SNR, for arbitrary g∈Gand h∈H ⁠, the optimal transmit power a^p*(b,g,h)=min{ap′∈argmax0≤ap≤b−1b>0{r(b,g,h,1b>0,ap)+λ[P(O0)V˜(b−1b>0,g,h)+P(O1)V˜(b−1b>0−ap,g,h)]}} (31)is monotonically increasing with respect to b ⁠. That is, given any g∈Gand h∈H ⁠, we have a^p*(b,g,h)≥a^p*(b−1,g,h),∀b∈B⧹{0}. (32) Proof The proof is in Appendix 1.2.□ From Theorem 5.4, we perceive that in sufficiently high primary SNR, the optimal transmit power is monotonically increasing with respect to the battery state b ⁠. With the system parameters presented in Table 1, and using the value iteration algorithm illustrated in Algorithm 1, the optimal transmit power is depicted in Fig. 3. From Fig. 3, we can see that the optimal transmit power is monotonically increasing with b ⁠, which demonstrates Theorem 5.4. Algorithm 1 OST policy. 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ→0 ⁠. 2: For each s∈B×G×H ⁠, do 3:  For each a=(aα,ap)∈As do   calculate Via(s)=r(s,aα,ap)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,aα,ap;Oi)Vi−1(s′) 4:  end For 5:  calculate Vi(s)=maxa∈As{Via} 6: end For 7: If maxs∈S∣Vi(s)−Vi−1(s)∣<ϵ(1−λ)/2λ, go to Step 8. Otherwise, increase i by 1 and go back to Step 2. 8: For each s∈S ⁠, obtain the OST policy π*(s)=argmaxa∈As{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)Vi(s′)} 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ→0 ⁠. 2: For each s∈B×G×H ⁠, do 3:  For each a=(aα,ap)∈As do   calculate Via(s)=r(s,aα,ap)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,aα,ap;Oi)Vi−1(s′) 4:  end For 5:  calculate Vi(s)=maxa∈As{Via} 6: end For 7: If maxs∈S∣Vi(s)−Vi−1(s)∣<ϵ(1−λ)/2λ, go to Step 8. Otherwise, increase i by 1 and go back to Step 2. 8: For each s∈S ⁠, obtain the OST policy π*(s)=argmaxa∈As{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)Vi(s′)} View Large Algorithm 1 OST policy. 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ→0 ⁠. 2: For each s∈B×G×H ⁠, do 3:  For each a=(aα,ap)∈As do   calculate Via(s)=r(s,aα,ap)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,aα,ap;Oi)Vi−1(s′) 4:  end For 5:  calculate Vi(s)=maxa∈As{Via} 6: end For 7: If maxs∈S∣Vi(s)−Vi−1(s)∣<ϵ(1−λ)/2λ, go to Step 8. Otherwise, increase i by 1 and go back to Step 2. 8: For each s∈S ⁠, obtain the OST policy π*(s)=argmaxa∈As{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)Vi(s′)} 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ→0 ⁠. 2: For each s∈B×G×H ⁠, do 3:  For each a=(aα,ap)∈As do   calculate Via(s)=r(s,aα,ap)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,aα,ap;Oi)Vi−1(s′) 4:  end For 5:  calculate Vi(s)=maxa∈As{Via} 6: end For 7: If maxs∈S∣Vi(s)−Vi−1(s)∣<ϵ(1−λ)/2λ, go to Step 8. Otherwise, increase i by 1 and go back to Step 2. 8: For each s∈S ⁠, obtain the OST policy π*(s)=argmaxa∈As{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)Vi(s′)} View Large Figure 3. View largeDownload slide Optimal transmit power with battery energy states, channel states, as well as arrival energy states. The primary signal’s SNR is β=−10dB ⁠. (a) h=0 ⁠, (b) h=1 ⁠, (c) h=2 and (d) h=3 ⁠. Figure 3. View largeDownload slide Optimal transmit power with battery energy states, channel states, as well as arrival energy states. The primary signal’s SNR is β=−10dB ⁠. (a) h=0 ⁠, (b) h=1 ⁠, (c) h=2 and (d) h=3 ⁠. With the monotonic structure of optimal transmit power illustrated in Theorem 5.4, we propose the efficient sensing-transmission (EST) policy as shown in Algorithm 2 which significantly reduce the computational complexity of Algorithm 1. The first main difference between Algorithm 1 and Algorithm 2 is that the total action set of sensing overhead simplified to {1b>0} in Algorithm 2. In addition, the procedure of calculating V¯i+1(b,g,h) in Algorithm 2 has lower complexity than calculating that in Algorithm 1. Concretely, in step 5 of Algorithm 2, for given g and h ⁠, we have a^pi+1(b+1,g,h)≥a^pi+1(b,g,h) according to Theorem 5.4. Hence, when we calculate the Vi+1(b+1,g,h) and find the a^pi+1(b+1,g,h) ⁠, we can search the optimal action from interval [a^pi+1(b,g,h),b−1b>0] instead of the longer interval [0,b−1b>0] ⁠. Next, we analyze the complexity of Algorithm 2.The total number of states in the state space is NB·NG·NH ⁠. Similar to the analysis of Algorithm 1, the total possible states the current system state can transmit to is 3NH ⁠. The maximum number of actions regarding the transmit power is B ⁠, and the maximum number of actions regarding the sensing overhead is two. Therefore, the complexity of each iteration in Algorithm 2 is O(NB2NH2NG) ⁠. Algorithm 2 EST policy. 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ>0 ⁠. 2: For each g∈G ⁠, h∈H do  set b=0 ⁠, apl=0 3:  While b≤NB−1 do 4:  calculate V¯i(b,g,h)=maxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 5:  calculate a^pi(b,g,h)=min{ap′∈argmaxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 6:  Set b≔b+1 ⁠, apl≔a^pi(b,g,h) 7: end While 8: end For 9: If maxs∈S∣V¯i(s)−V¯i−1(s)∣<ϵ(1−λ)/2λ ⁠, go to Step 10. Otherwise, increase i by 1 and go back to Step 2. 10: For each s∈S ⁠, obtain the EST policy π*(s)=argmaxa∈A^s{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)V¯i(s′)} 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ>0 ⁠. 2: For each g∈G ⁠, h∈H do  set b=0 ⁠, apl=0 3:  While b≤NB−1 do 4:  calculate V¯i(b,g,h)=maxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 5:  calculate a^pi(b,g,h)=min{ap′∈argmaxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 6:  Set b≔b+1 ⁠, apl≔a^pi(b,g,h) 7: end While 8: end For 9: If maxs∈S∣V¯i(s)−V¯i−1(s)∣<ϵ(1−λ)/2λ ⁠, go to Step 10. Otherwise, increase i by 1 and go back to Step 2. 10: For each s∈S ⁠, obtain the EST policy π*(s)=argmaxa∈A^s{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)V¯i(s′)} View Large Algorithm 2 EST policy. 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ>0 ⁠. 2: For each g∈G ⁠, h∈H do  set b=0 ⁠, apl=0 3:  While b≤NB−1 do 4:  calculate V¯i(b,g,h)=maxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 5:  calculate a^pi(b,g,h)=min{ap′∈argmaxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 6:  Set b≔b+1 ⁠, apl≔a^pi(b,g,h) 7: end While 8: end For 9: If maxs∈S∣V¯i(s)−V¯i−1(s)∣<ϵ(1−λ)/2λ ⁠, go to Step 10. Otherwise, increase i by 1 and go back to Step 2. 10: For each s∈S ⁠, obtain the EST policy π*(s)=argmaxa∈A^s{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)V¯i(s′)} 1: Set V0(s)=0 ⁠, s∈B×G×H ⁠, set i=1 ⁠, choose the ϵ>0 ⁠. 2: For each g∈G ⁠, h∈H do  set b=0 ⁠, apl=0 3:  While b≤NB−1 do 4:  calculate V¯i(b,g,h)=maxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 5:  calculate a^pi(b,g,h)=min{ap′∈argmaxapl≤ap≤b−1b>0{pi(1−Pf(1b>0αu))·Eg′[u(1b>0αu,appu,g′)∣g]+λ[P(O0)Vi˜(b−1b>0,g,h)+P(O1)Vi˜(b−1b>0−ap,g,h)]}, 6:  Set b≔b+1 ⁠, apl≔a^pi(b,g,h) 7: end While 8: end For 9: If maxs∈S∣V¯i(s)−V¯i−1(s)∣<ϵ(1−λ)/2λ ⁠, go to Step 10. Otherwise, increase i by 1 and go back to Step 2. 10: For each s∈S ⁠, obtain the EST policy π*(s)=argmaxa∈A^s{r(s,a)+λ∑i=0,1P(Oi)∑s′∈SP(s′∣s,a;Oi)V¯i(s′)} View Large 6. NUMERICAL RESULTS AND DISCUSSION We present simulation results to evaluate the performance of proposed policies in this section. The values of parameters are listed in Table 1, which draws mainly from [23, 28, 36]. The unit of energy quantum is eu=0.5 mJ. The total number of battery states is NB=20 ⁠. The thresholds of the channel power are G0=0,G1=0.3,G2=0.6,G3=1.0,G4=2.0,G5=3.0 ⁠. The value of arrival energy takes from the finite set {eu,3eu,5eu,7eu} mJ per time slot, namely Q0=1 ⁠, Q1=3 ⁠, Q2=5 ⁠, Q3=7 ⁠, and the arrival energy state evolves with the transition matrix given by Ph=P0,0P0,1P0,2P0,3P1,0P1,1P1,2P1,3P2,0P2,1P2,2P2,3P3,0P3,1P3,2P3,3=0.50.5000.250.50.25000.250.50.25000.50.5. (33) ϵ is selected as 10−2 ⁠. The initial state is s0=(2,1,1) ⁠; the policy execution time is 500 slots. All of the numerical results are averaged with 1000 independent runs. We compare the proposed OST and EST policies with a benchmark named shortsighted policy [26, 44] with respect to the performance. The shortsighted policy just focuses on maximizing the current immediate reward, without taking into account the future expected reward, namely λ=0 ⁠. However, the policies proposed in this paper consider not only the immediate reward acquired in the current time slot, but also the expected reward available in the future. Therefore, by comparing with the shortsighted policy, we can evaluate the benefit and advantage of proposed policies. In addition, we also compare the proposed policies with a deterministic energy harvesting scheme in [45] named t−time fair rate assignment (⁠ t-TFR) scheme which requires a perfect knowledge of arrival energy within the duration t ⁠. t-TFR scheme aims to maximize the summation of the rewards over a short-term period t time slots by adjusting its transmit power to the system states. As to the spectrum sensing, we consider the t-TFR scheme attempts to sense the primary channel at the lowest level of sensing overhead when the battery storage is positive. Figure 4 shows the expected throughput of the OST, EST, the shortsighted, as well as the 3-TFR policies with different primary SNR β and channel idle probabilities. We observe that the OST policy surpass the shortsighted policy in terms of the expected throughput for all settings of primary SNR. This is because the OST policy chooses action by jointly considering the current reward as well as the available reward in future; however, the shortsighted policy just concerns the current reward, disregarding impacts of the action on future time slots. We can also find that the expected throughput of the OST policy is superior to that of the t-TFR scheme, even if the information of the short-term arrival energy is assumed to be perfectly predicted by t-TFR scheme. This is owing to that although the t-TFR scheme can optimize its transmit power, the time duration respecting the spectrum sensing is fixed and unable to adjust to the system state, and t-TFR only focuses on maximizing the summation of total throughput within t time slots, while the OST focuses on maximizing the long-term expected throughput. Moreover, it can be seen that the EST policy performs worse than the OST policy at low primary SNRs, while when primary SNR β is high (e.g. higher than −13 dB), the performance of EST and OST completely overlap, and finally achieves a saturation effect. This is because when β is sufficiently high, the action set regarding the sensing overhead degenerates to A^αs illustrated in Equation (24), therefore the EST policy is equivalent to the OST policy. At last, it is shown that the saturated expected throughput of the four policies in high SNR regions becomes higher as pi gets higher. This is due to a higher pi implies a higher probability of utilizing the idle primary channel to transmit data, bring about a higher expected throughput. Figure 4. View largeDownload slide Expected throughput vs. primary signal’s SNR. Figure 4. View largeDownload slide Expected throughput vs. primary signal’s SNR. Figure 5 illustrates the expected throughput of four policies with different channel idle probabilities pi and different primary SNRs, where the performance curves plotted correspond to β=−16dB and β=−14dB ⁠, respectively. It is shown that with a fixed primary SNR, the OST policy offers better performance than the other policies under all settings of pi ⁠. Moreover, it is observed that the expected throughout of four policies strictly increases as pi grows, as a higher idle probability indicates a higher possibility of successful data transmission and hence increases the SU’s throughput. In addition, we can see that when β is relatively small (⁠ β=−16dB ⁠), the gap between the EST and the OST policies is huge, and the shortsighted policy is superior to the EST policy. When β is relatively large (⁠ β=−14dB ⁠), the gap between the OST and EST policies is slight and the EST policy attains almost the same performance with the shortsighted policy. This phenomenon coincides with Fig. 4, namely the performance of EST gradually converges to that of the OST as β increases. Furthermore, although the t-TFR scheme achieves a large fraction of throughput as available in the EST policy, it suffers from high computational complexity, and it is hard to perfectly predict the arrival energy in the practical situation by reason of the sporadic and random property of energy harvesting. Figure 5. View largeDownload slide Expected throughput vs. channel idle probability. Figure 5. View largeDownload slide Expected throughput vs. channel idle probability. Figure 6 illustrates the expected throughput of four policies versus the average channel gain Ga for different β ⁠. It is obvious that the performance of four policies increases with the growth of Ga ⁠. As Ga grows, the data transmission rate increases, leading to a higher expected throughput. Moreover, as expected, the OST policy offers a performance better than the shortsighted policy. Further, it is shown that the OST policy outperforms the t-TFR scheme since it considers both the sensing duration and the transmit power to maximize the long-term throughput. It is also easy to find that the gaps between the OST and the EST policies reduce with the increases of β ⁠. As previously discussed, the performance of the EST approaches to that of the OST as β increases, reducing the gap between the EST and the OST. Figure 6. View largeDownload slide Expected throughput vs. channel idle probability. Figure 6. View largeDownload slide Expected throughput vs. channel idle probability. The expected throughput with the number of battery energy states NB and the primary SNRs is shown in Fig. 7. We can observe that the expected throughput of OST and EST policies can be enhanced by enlarging the battery energy buffer size to store more energy quanta, especially when the receiving primary SNR β is high. For instance, the performance of the OST policy with NB=24 at β=−14dB is 0.64 bit/time slot/Hz, nearly 1.3 times that achieved by the same policy at NB=4 ⁠. This owing to that with a higher NB ⁠, the SU can exploit the energy more effective. Specifically, if the channel quality is expected to be good and the amount of arrival energy is expected to be large, the SU should raise the transmit power to increase the expected throughput of the current slot; however, if the channel quality is expected to be bad and the amount of arrival energy is expected to be small, it is better to decrease the transmit power level so as to store more energy for future application. However, the performance regarding the shortsighted almost remains unchanged when the number of battery state NB is larger than the largest arrival energy quanta (e.g. Q3=7 ⁠). This is due to that the shortsighted policy intends to allocate all the energy stored in the battery for data transmission at each time slot; therefore, the performance is constrained by the storage capacity if the battery capacity is too small to store the arrival energy (e.g. NB0 ⁠, in order to prove Theorem 5.4, we intend to demonstrate that the a^pi+1(b,g,h) ⁠, which is defined as a^pi+1(b,g,h)=min{ap′∈argmax0≤ap≤b−1{pi[(1−αu)T]Eg′[u(αu,appu,g′)∣g]+λP(O0)Vi˜(b−1,g,h)+λP(O1)Vi˜(b−1−ap,g,h)}} (A.6) is monotonically increasing as to b for all i ⁠. We denote f(ap)=pi[(1−αu)T]Eg′[u(αu,appu,g′)∣g] ⁠, hi(b)=λP(O0)Vi˜(b,g,h) and gi(b)=λP(O1)Vi˜(b,g,h) ⁠. For brevity, we drop arguments g and h in all functions. Then we have V¯i+1(b)=max0≤ap≤b−1{f(ap)+gi(b−1−ap)+hi(b−1)}. (A.7) Define the lower bound and the upper bound of the feasible actions of ap are apl(b) and apu(b) ⁠. Based on (A.7), we have apl(b)=0 ⁠, and apu(b)=b−1 ⁠, and both apl(b) and apu(b) are non-decreasing with b ⁠. According to Lemma A.1, it is sufficient to prove that the function f(ap)+gi(b−1−ap)+hi(b−1) has non-decreasing difference in (b,ap) ⁠, that is for arbitrary b′>b ⁠, ap′>ap (f(ap′)+gi(b′−1−ap′)+hi(b′−1))−(f(ap′)+gi(b−1−ap′)+hi(b−1))≥(f(ap)+gi(b′−1−ap)+hi(b′−1))−(f(ap)+gi(b−1−ap)+hi(b−1)). (A.8) Inequality (A.8) can be rewritten as gi(b′−1−ap′)−gi(b−1−ap′)≥gi(b′−1−ap)−gi(b−1−ap). (A.9) According to Lemma 5.2, we can derive that gi(b)=λP(O1)Vi˜(b,g,h) is concave in b since λP(O1) is a constant and Vi˜(b,g,h) is concave in b ⁠. By applying the concavity property, we have gi(u+δ)−gi(u)≥gi(v+δ)−gi(v),∀u≤v,δ≥0 (A.10) By substituting u=b−1−ap′ ⁠, v=b−1−ap ⁠, δ=b′−b ⁠, we get (A.9). Now, by applying the Lemma 5.2, we prove that a^pi+1(b,g,h) is monotonically increasing in b for any given g and h for all n, where b>0 ⁠. When b=0 ⁠, from (25) we see the only action of transmit power is ap=0 ⁠, hence we have a^pi+1(0,g,h)=a^pi+1(1,g,h)=0 ⁠. Therefore, we can conclude that a^pi+1(b,g,h) is monotonically increasing in b for any given g and h for all i ⁠, where b≥0 ⁠. Thus, a^p*(b,g,h)=a^pi→+∞(b,g,h) is increasing in b for given g and h ⁠. The proof has been completed. © The British Computer Society 2018. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Joint Optimization of Spectrum Sensing and Transmit Power in Energy Harvesting Cognitive Radio Sensor Networks JF - The Computer Journal DO - 10.1093/comjnl/bxy070 DA - 2019-02-01 UR - https://www.deepdyve.com/lp/oxford-university-press/joint-optimization-of-spectrum-sensing-and-transmit-power-in-energy-kDOGA9YL02 SP - 215 VL - 62 IS - 2 DP - DeepDyve ER -