# Analysis of dynamic reliability surveillance: a case study

Analysis of dynamic reliability surveillance: a case study Abstract In this paper a reliability model based on artificial neural networks and the generalized renewal process is developed. The model is used for failure prediction, and is able to dynamically adapt to changes in the operating and environmental conditions of assets. The model is implemented for a thermal solar power plant, focusing on critical elements of these plants: heat transfer fluid pumps. We affirm that this type of model can be easily automated within the plant’s remote monitoring system. Using this model we can dynamically assign reference values for warnings and alarms and provide predictions of asset degradation. These in turn can be used to evaluate the associated economic risk to the system under existing operating conditions and to inform preventive maintenance activities. 1. Introduction The consistency of reliability predictions in the renewable energy sector is important because of the high impact of production losses. These predictions are complex to generate, due to the fact that assets’ operating conditions change seasonally and geographically. In this sector, the optimization of maintenance programs for an asset must consider operational variables (configurations, preventive maintenance, undue handling, etc.) as well as environmental conditions (cleanliness, fastening, temperature, etc.). Then, reliability estimations that take these contributing factors into account may be informative. Besides this, the need to update these estimations over time, and the proper consideration of explanatory variables or covariates, is critical to predict time to system failure. There are many techniques for survival analysis and estimation (Cox & Oakes, 1984, Smith, 2002) that use explanatory variables. These techniques can be parametric when the failure distributions are known, semi-parametric in the case of unknown failure distribution but with defined assumptions of proportionality with time covariates (independent among them), or non-parametric when the failure distributions are not specified (Lee & Wang, 2003, Hougaard, 2012). Flexibility and complexity of computational implementation increase from parametric to non-parametric methods (Ohno-Machado, 2001). In the search for efficiency, artificial intelligence (AI) can be used to solve, computationally, these prediction problems (Kalogirou, 2007; Cohen & Feigenbaum, 2014), and well-accepted and recommended methods for these purposes are artificial neural network (ANN) models (Mellit et al., 2005; Martín et al., 2010; Caner et al., 2011; Kuo, 2011). These networks are composed of multiple, connected units (neurons). The standard ANN architecture is one input layer, one output layer and generally, one or more hidden layers. To achieve auto-adjustment, an often used ANN is the backpropagation neural network (BANN) which prevents overtraining. This technique filters the noise and recognizes the most overt and accessible patterns, overcoming in an order of magnitude the linear conventional methods and the polynomial methods (Lapedes & Farber, 1987). Consequently, BANN provides a strong tolerance to noisy data, due to the storage of redundant information, and adaptation in the presence of explicit knowledge for the resolution of problems (Malcolm et al., 1999; Curry et al., 2000). In our proposed prediction model, the neural network model under consideration is a feed-forward single layer perceptron (Rosenblatt, 1958), which is composed of one input layer with $$P$$ neurons, an intermediate or hidden layer with $$M$$ neurons, and an output layer with 1 neuron, $$Y$$. The neural network output, $$Y$$, is a function based on a linear combination of a set of weights with the outputs from the intermediate layer neurons. All the weights used in the linear combination are learned by a back-propagation algorithm (Lawrence et al., 1998), in which the sum-of-squared errors   R2=∑i=1NRi2=∑i=1N(Y(i)−f(X(i)))2 (1) is optimized, reaching a global minimum by a loop with a maximum of $$S$$ steps, where in each s-step a couple of forward and backward actions are executed in the $$N$$-elements training set $$\thinspace \mathrm{TS}=\{\left( X^{\left( i \right)},Y^{\left( i \right)} \right)\vert i\in \left\{ 1,... ,N \right\}$$, where $$X^{(i)}=\left( X_{1}^{(i)},X_{2}^{(i)},... ,X_{P}^{(i)} \right)$$ and where $$Y^{(i)}=\left( Y_{1}^{(i)},Y_{2}^{(i)},... ,Y_{P}^{(i)} \right)$$ are an input and output values respectively from the training set. This paper is fundamentally concerned with the utilization of a BANN to obtain predictions about failures before they occur. For this purpose, the paper is structured in three main parts: a first part, Section 2, which includes the development of our prediction model; then, a second part including Sections 3 and 4 which include a case study; finally a section that is dedicated to the conclusions of the paper. 2. Proposed prediction model In this section we assume that there are sufficient experimental data about failures. Due to their experimental nature, these raw data do not follow a formal parametric failure distribution function, and in addition these have non-linear correlations with the covariates that determine accelerating process in the failure appearance. In order to simplify, we assume these covariates are independent among them and with no -time-dependency. In these cases, the maintenance decision making for renewal energy equipments, under different operating environments, can be supported by ANN, with the following benefits: Suitable to the managed amount of failure data, the more information you provide the more relevant your result becomes as continuous improvement, Implementable in remote monitoring systems, searching automation not only for prediction but also for self-adjusting based on real data, Flexible to changes, either for the elimination of any covariate or for the incorporation of new covariates in the input layer or by combining several ANN hierarchically. Numerous papers (Ravdin & Clark, 1992; Liestbl et al., 1994; Faraggi & Simon, 1995; Biganzoli et al., 1998; Xiang et al., 2000; Ohno-Machado, 2001) present the comparison between several existing methods to fit survival functions showing relations between the reliability and the covariates; for instance, comparing the widely applied semi-parametric Cox’s proportional hazard model (PHM) (Cox, 1972) versus several ANN models. When the model complexity is low, based on a few covariates and with proportional relations with the reliability, there are no significant differences between predictions of Cox regression and ANN models. In the case of complex models with many covariates and with any interaction term, predictions of ANN models have important advantages compared to Cox regression models. In practice, depending on the way that covariates are considered, different ANN models can be built; for instance: ANN can be used instead of the linear combination of weight coefficients in the Cox PHM, as in Faraggi & Simon (1995). In this case it is necessary to solve the PHM using the Partial Maximum Likelihood Estimation (P-MLE). Using an input with the survival status over disjoint time intervals where the covariate values are replicated. A binary variable can be used with value 0 before the interval of the failure and 1 in the event of failure or later, as in Liestbl et al. (1994). In this case, each time interval is an input with a survival status, then a vector of survival status is defined per failure; or Employing the Kaplan–Meier (K–M) estimator to define the time intervals as two additional inputs instead of a vector; one is the sequence of the time intervals defined by the K–M, and the other is the survival status at each time of the sequence. This is the case of Ravdin & Clark (1992) or Biganzoli et al. (1998), models that are known as proportional K–M. Since the aim of this research is to improve results of parametric methods, combining them with the self-adaptive property of ANN, the proposed survival ANN model is based on the ideas of Ravdin & Clark (1992). At the same time some mathematical modifications are introduced for simplification and in order to facilitate reliability surveillance (see Table 1). Considering these modifications the application of the proposed model is sequenced in two phases: In a first phase, for easy understanding of results and for applying another reliability analysis over time to failures. The re-utilization of parametric method outcomes is recommended, as a previous estimation and without the consideration of covariates. Because of this, we will have a parametric estimation of the survival curve. Instead of using the K–M estimator, the General Renewal Weibull Process II (GRP-II) method is selected in order to fit the curve better and to reduce the negative effect of a non-monotonically decreasing survival curve. In our study, we also propose to combine ANN with the GRP-II parametric model, which evaluates survival probability in repairable and non-repairable systems, and models the repair efficiency estimation in avoiding the overall damage produced due to all the successive failures (González-Prida et al., 2014). The GRP-II model has more accuracy than the GRP-I for complex systems, or with data from multiple devices of the same type (Dagpunar, 1997). GRP-II gives three Weibull parameters, $$\alpha$$-scale, $$\beta$$-shape and $$q$$-repair efficiency. In the $$q$$ parameter, the recurrence rate of failures is captured showing the effects of the repairs on the age of that system $$n$$ (modelling partial renewal repair). Properly performed repairs (0< $$q_{n}$$ < 1) may improve system virtual states (life), while poorly solved failures ($$q_{n}$$1) could aggravate it (always speaking in terms of reliability). Then, the virtual age of the system $$i$$ is updated after a failure $$j$$ according to the following equation 2 (where $$T_{nj}$$ are the time intervals between successive failures) and the failure probability distribution conditioned to the survival new virtual age is calculate in equation 3:   Vnnew=qn⋅(Vnold+Tnj) (2)  F(t|Vnnew)=P[Tnj⩽t|Tnj>Vnnew]=F(t)−F(Vnnew)1−F(Vnnew) (3) Only based on the times to failure ($$T_{nj})$$, GRP-II is applied over $$k$$ groups of failures with similar covariate values, for example for each plant. Then, Weibull parameters $$\alpha$$, $$\beta$$ and $$q$$ are obtained for each group, and without using the covariates, only based on time to failures.   F(t|Vnknew)=exp[(Vnnewαnk)βnk−(tαnk)βnk] (4) After that, in order to adapt the parametric estimation of the survival function according to covariates, the back-propagation ANN is utilized with the following criteria:   A sigmoid (logistic) function f(x)=ex(1+ex), (5) over normalized variables between 0 and 1, is used for hidden layer neurons and a linear combination of the latter, for determining the neural network output. It is not recommendable to use more than two times the number of input neurons in the hidden layer (Lawrence et al., 1998). Discretized times inside the intervals $$T_{nkj}$$ over $$k$$ groups of failures with similar covariate values $$X_{nk}$$ are defined according to the available and representative information for the selected failure. In order to be upgradeable iteratively and with real data and for all the intervals, the real covariate values will be used as inputs $$X_{nk}^{(i)}$$, but only up to the time the failure occurs, after which the average covariate value is taken $$X_{nk}^{(i)}=\dot{X}_{nk}$$. In order to homogenize the selected intervals $$T_{nkj}$$, all of them are extended after the failure time in discretized times $$t^{(i)}$$ of the training set up to the maximum length of the intervals (which corresponds to the maximum time to failure), resulting all ($$\dot{T}_{nkj})$$ with the same length.   TSnk={(Xnk(i),Ynk(i))|i∈{1,…,N}} (6) The additional input of the survival status, in our model, has a gradual increment from 0 to 1 in the failure event or after each specific time to failure. To do this, the Weibull Cumulative Distribution Function (CDF) can be used for all intervals, with the previously obtained $$\beta_{nk}$$ per group by GRP-II method, searching to maintain the shape of predictions. Besides, in order to obtain proportionality, the gradual increment from 0 to 1 of the survival status, the Weibull-scale $$\alpha_{nk}$$ in each interval is adapted to the specific time interval $$T_{nkj}$$ (between successive failures) pondered by the Median Life. Therefore, the output layer contains a single output neuron corresponding to the estimated survival status (probability to failure) that ascends from 0 to 1 until the time to failure and after, see Equation 7, where for all the intervals: $$\beta_{\mathrm{GRP}}$$ is obtained by GRP-II for each group of times to failures, and $${\alpha \mathrm{ }}_{nkj}\mathrm{=}{T\mathrm{ }}_{nkj}\mathrm{\cdot }{\mathrm{Ln(2)}}^{1/{\mathrm{\beta }}_{nk}}$$:   CDF(t(i))=1−[1/exp(t(i)αnkj)βnk]=1−[1/exp(t(i)(Tnkj⋅Ln(2)1/βnk))βnk] (7) The training of the network, which gives us the network settings, is carried out based on 75% of available data; and the other 25% is used for the network testing in order to subsequently validate the behaviour pattern. The learning backpropagation algorithm used is a supervised error correction, minimizing the penalized mean square error through the Quasi-Newton method in the free software R.   RMSE=∑i=1N(Y(i)−f(X(i)))2N (8) Table 1 Training set of normal and modified R&C ANN       Normal R&C ANN  Modified R&C ANN     $$t^{(i)}$$  $$X_{1n1}$$  $$X_{1n1}$$  Survival status  $$X_{1n1}$$  $$X_{1n1}$$  Survival status, CDF($$t^{(i)})$$  Pump$$_{\mathrm{11}}$$$$\dot{T}_{\mathrm{111}}$$  1  1  1  0  0.7  0.6  0.27     2  1  1  0  0.8  0.7  0.47     3  1  1  0  0.7  0.8  0.62     4  1  1  0  0.9  0.8  0.72     5  1  1  0  0.8  0.8  0.80     6  1  1  1  0.8  0.9  0.85     7  —  —  —  0.8  0.8  0.89     8  —  —  —  0.8  0.8  0.92  Pump$$_{\mathrm{21}}$$$$\dot{T}_{\mathrm{211}}$$  1  1  0  0  0.8  0.2  0.21     2  1  0  0  0.7  0.1  0.38     3  1  0  0  0.9  0.3  0.51     4  1  0  0  0.6  0.1  0.62     5  1  0  0  0.8  0  0.70     6  1  0  0  0.7  0.1  0.76     7  1  0  0  0.8  0  0.81     8  1  0  1  0.9  0.2  0.85        Normal R&C ANN  Modified R&C ANN     $$t^{(i)}$$  $$X_{1n1}$$  $$X_{1n1}$$  Survival status  $$X_{1n1}$$  $$X_{1n1}$$  Survival status, CDF($$t^{(i)})$$  Pump$$_{\mathrm{11}}$$$$\dot{T}_{\mathrm{111}}$$  1  1  1  0  0.7  0.6  0.27     2  1  1  0  0.8  0.7  0.47     3  1  1  0  0.7  0.8  0.62     4  1  1  0  0.9  0.8  0.72     5  1  1  0  0.8  0.8  0.80     6  1  1  1  0.8  0.9  0.85     7  —  —  —  0.8  0.8  0.89     8  —  —  —  0.8  0.8  0.92  Pump$$_{\mathrm{21}}$$$$\dot{T}_{\mathrm{211}}$$  1  1  0  0  0.8  0.2  0.21     2  1  0  0  0.7  0.1  0.38     3  1  0  0  0.9  0.3  0.51     4  1  0  0  0.6  0.1  0.62     5  1  0  0  0.8  0  0.70     6  1  0  0  0.7  0.1  0.76     7  1  0  0  0.8  0  0.81     8  1  0  1  0.9  0.2  0.85  As a result, the modification of the Ravdin and Clark type of ANN is shown as an example in Table 1 in comparison with the normal model for two pumps (1 and 2) of the same group (1) of similar covariates and for two homogenized time intervals of failures $$\dot{T}_{nkj}$$(one of each pump). In this example, discretized times $$t^{(i)}$$ are shown in the intervals $$\dot{T}_{\mathrm{111}}$$ and $$\dot{T}_{\mathrm{211}}$$ are shown jointly with the covariate values. Thus, Ravdin and Clark’s ANN (R&C ANN) was trained with equal covariate value for all the discretized times in each interval, for example $$X_{nk}^{(i)}=\dot{X}_{nk}$$, and it was interrupted at each specific time to failure $$T_{nkj}$$. While in the modified R&C ANN real data covariates are used $$X_{nk}^{(i)}$$ and the their average value $$\dot{T}_{nk}$$ after the $$T_{nkj}$$ time to failure, but in this case for each homogenized time to failure $$\dot{T}_{nkj}$$. The original ANN uses a binary system 0 or 1 as survival status and the modified ANN uses estimated probability of failure from a Weibull according to the equation 7 (with $${\alpha \mathrm{ }}_{111}\mathrm{=6\cdot }{\mathrm{Ln(2)}}^{1/1.5}$$ and $${\alpha \mathrm{\alpha }}_{211}\mathrm{=8\cdot }{\mathrm{Ln(2)}}^{1/1.5}$$. This modification, based on the combination of parametric and AI methods, aims to show how existing information and analysis in the plants, jointly with a monitoring system, may be used to improve decisions, mixing off-line statistical models with on-line real data from remote monitoring systems. As any ANN, the main weak point in this model is the necessity to adjust the covariate values according to their representative influence in the selected failure in discrete times. That is, depending on their influence in failure degradation with time, it avoids random and bad-acquired values but keeps the right data seasonality. Besides this, normalization among different geographical locations is required in order to replicate the analysis. However, once the analysis is accomplished, the developed model can be applied easily in a remote monitoring system, requiring only a model refining each 2 or 3 years, or may be when operating circumstances change radically. A set of alarms for observed abnormal tendencies may also be implemented (i.e. for $$q$$-repair efficiency, warning about tendency of successive repairs to the system status conservation). This could also be used as a warning about the lack of model consistency, i.e. about the need to restart the analysis with new $$T_{nkj}$$ intervals to capture new repair-stages. In the sequel, the proposed model will now be built and tested in a case study. The idea is also to implement it in a remote monitoring system. 3. Case study. A thermal solar plant Thermal solar plants have been in production for more than 25 years. Current decrease in government incentives for renewable energy sources has forced companies to study useful life extension possibilities. Due to this, potential plant re-investments must also be re-evaluated; incorporating future operating and environmental conditions within equipment reliability analysis. In these plants the combination of mechanical and thermal stresses makes reliability analysis important. This is not only because the direct costs of failures, but also due to their significant indirect loss of profit, as well as the associated environmental and safety risks (Ennis, 2009). By developing a model for failure prediction we can avoid these risks. This model will be applicable to each critical failure mode, because symptoms and causes may be dissimilar among them and the effect of equipment conditions may apply in a different manner. Understanding the previous point is important; efforts in failure mode analysis will be intense but worthwhile. For instance, defining suitable covariates per failure mode, could add enormous value to protecting our assets and their contribution to the business. This type of thermal solar plant is usually built modularly; therefore the possibility to replicate the same model for different modules and regions is also considered of great interest. With that in mind, we have tried to develop our ANN model, which is easy to reproduce, and to update it with the most common parameters found in this type of plant. The solar thermal power plant under consideration has a nominal power of 49.9 MW with an annual production of 180 million KWh and occupies an extension of 2,700,000 square meters. It is located in the southern part of Spain and it will supply energy to more than 100,000 dwellings for an operational time of 25 years. Even in the absence of enough solar radiation, its storage subsystem is able to provide energy for 7.5 h. All the energy produced (180 million KWh/year) is provided by the distributor for 51 million €/year of production (at an initial price of 0.2849 €/KWh, and subsequently reduced due to a legal requirement). The main subsystems of this kind of power plant are, see Fig. 1: Fig. 1. View largeDownload slide Thermal solar plant and functional description. Fig. 1. View largeDownload slide Thermal solar plant and functional description. Solar field. It is composed of 8,064 parabolic trough solar collectors with 225,792 mirrors. It heats the high-temperature oil circulated in the HTF loop. HTF loop, which conveys high-temperature fluid to the heat exchanger in the steam generator. Water loop, where steam flow is condensed and cooled and recirculated as a water flow to the steam generator. Steam generator, which transforms water into steam to activate the turbine. Turbine, which transforms the mechanical energy of the steam flow into electrical power. In our case study, we have selected a common system to illustrate the model implementation over real data and in a remote monitoring system: electrical pumps in HTF loops. They consist of several large pumps in charge of making the thermal oil (Dowtherm A) to flow throughout the plant. Usually, the tendency is to view HTF pumps in a Thermal Solar Plant as a low potentially hazardous process despite being a heat transfer and under pressure systems that could produce fire and explosion hazards, where leaks can produce a potentially flammable mist or contamination (Ennis, 2009). Moreover, their operation is critical in order to keep the desired availability of the power plant, so all the production is supported by a 2$$+$$1 HTF pumps configuration (two of them working in parallel and the other is a spare one). Both active pumps jointly contribute 50% of HTF recirculation oil and a pump failure could reach up to 50% of daily production. Therefore, HTF pumps need surveillance to ensure their efficiency and to control their deterioration. In order to optimize plant efficiency, the remote monitoring system would set the HTF pump speed through changes in the variable frequency drives, according to different temperatures, the direct beam irradiance, pressures, and also the potential fluid density. For example, a bigger difference between inlet-outlet HTF temperatures requires less impelled flow. For the purpose of this paper, we have selected the failure mode: ‘damaged mechanical seal’, causing significant production losses. This failure mode emerges due to many factors, such as: high seal operating temperature, excessive pump vibration by cavitation, parts misalignment, etc. This problem increases during the summer period when pumps run at full load, at which time production losses are the highest. Potential mechanical seal failures are predicted using our developed back-propagation neural network, equipped with the last 3 years pumps’ historical data. We focus our attention on failures resulting as a consequence of equipment deterioration due to operational and geographical (environmental) features that could have a great impact on equipment conditions. For instance, we know that extreme fluctuant cycles of inlet–outlet pressure and high temperatures can degrade the oil, producing contamination and corrosion; contraction–expansion may also result in misalignments. For this case, predicting the problem in real-time, using process-control variables and with transfer function (thermodynamic approach) was found to be impossible. So, all representative contributions to pump degradation are compound in a single (survival) function which reflects the probability of the failure mode. In this document we show the aptitudes of an ANN to replicate self-adaptive reality by fitting a survival function. This is done in complex and noisy operating conditions. Our prediction models have innovative features compared to previous works in the literature. The ANN models not only use parametric estimations about the failure times, but also environment variables, such as external humidity, and also assets’ condition variables, such as working temperature and different operating times and cycles. In addition, parametric methods are combined with ANN in order to develop a stable model which will be easily and quickly implementable in a remote monitoring system. Through this, an early detection of degradation will be possible before failures affect production, people or the environment, and a quantitative measure of risk can be computed as a percentage. 4. Developing and implementing the ANN model In order to approach the problem of real time condition estimation that could lead to early warnings for the failure mode, the modified ANN architecture is developed based on selected variables from those whose detection is periodically and automatically feasible with our remote monitoring system and they are the most representative showing their effects in the damaged seal of HTF pumps. Specific information about the developed process is as follows: We selected two plants with eight failures each one. The remote monitoring variables for the input layer were: ○ Flow on HTF (l/s). ○ Working temperature on HTF (°C). ○ Ambient humidity (%). ○ The operation time of the pumps (days). ○ The modelled survival status. ○ The threshold neuron. ○ Periods for comparison were selected, to detect the existence of the failure mode and the most representative variables. ○ The data was reorganized, eliminating abnormal data that could distort the results. Values were normalized and with the same scale for all the input values to simplify calculations and analysis. Later the normalized values have to be de-normalized before comparison. ○ A single hidden layer with nine neurons is used (less than two times the number of the input neurons). In summary, the implementation of our proposed model is based on the two phases: The estimation, in a first step, of the survival function with a parametric GRP-II Weibull over two groups of the produced time to failures where the covariates are the same, one over the eight failures of plant 1 and the other over the eight failures of plant 2. Then, we obtain a characteristic $$\alpha$$, $$\beta$$ and $$q$$ for each plant, only based on time to failures (no covariates are used at this level) as shown in Table 2. We take the Weibull CDF, maintaining the $$\beta_{nk}$$ in each time interval in each plant (for each group), and taking the $${\alpha \mathrm{ }}_{nkj}\mathrm{=}{T\mathrm{ }}_{nkj}\mathrm{\cdot }{\mathrm{Ln(2)}}^{1/{\mathrm{\beta }}_{nk}}$$. As a result for each specific failure, the probabilities to failure ascend from 0 up to 1, next to the maximum failure time. In Table 2, the 16 failures with their time to failure ($$T_{nkj})$$ and the modified $$\alpha_{nkj}$$ with the ponderation are shown for each plant. The modelling, in a second step, of the survival function for each specific failure with adaptation of the parametric estimation according to covariates. For this purpose we have based it on the modified R&C ANN. The discretized time $$t^{(i)}$$ inside the intervals is selected according to the covariates influence in the degradation of the failure mode. Therefore all the inputs (covariates and CDF) are redefined with this period (notice that this requires the replication of covariates after each specific time to failure with their average value). In our example, for a population of six pumps (three per plant), and sixteen registered failures in three years, after filtering and reorganization, 914 discretized times are trained in the survival ANN model. Consequently, the data to train and test the GRP–ANN are reorganized as below and illustrated in Table 3 for plant 1 and failure number 3 ($$\alpha_{nkj}= 148.33$$, $$\beta_{nk}=$$ 3.7). That is, if $$\left[ 0\leqslant t^{(i)}\leqslant T_{nkj} \right]$$ then   Xnk(i)=real value of vector Xnkin each t(i)CDF(t(i))=1−[1/exp(t(i)(Tnkj⋅Ln(2)1βnk))βnk] and if $$\left[ 0\leqslant t^{(i)}\leqslant T_{nkj} \right]$$ then   Xnk(i)=X˙nk of previous t(i)toTnkjCDF(t(i))=1−[1/exp(t(i)(Tnkj⋅Ln(2)1/βnk))βnk]. Table 2 GRP Weibull parameters and reorganization for parametric estimation of survival function Plant 1  Plant 2  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  1  299.53  271.28  1  181.78  167.38  2  277.76  251.56  2  170.53  157.02  3  163.78  148.33  3  288.00  265.18  4  176.22  159.60  4  128.03  117.89  5  149.21  135.14  5  277.89  255.87  6  214.71  194.46  6  256.00  235.72  7  136.59  123.71  7  194.00  178.63  8  170.90  154.78  8  300.00  276.23  $$\alpha_{n1}$$  220.00    $$\alpha_{n2}$$  247.00    $$\beta_{n1}$$  3.70    $$\beta_{n2}$$  4.44    Plant 1  Plant 2  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  1  299.53  271.28  1  181.78  167.38  2  277.76  251.56  2  170.53  157.02  3  163.78  148.33  3  288.00  265.18  4  176.22  159.60  4  128.03  117.89  5  149.21  135.14  5  277.89  255.87  6  214.71  194.46  6  256.00  235.72  7  136.59  123.71  7  194.00  178.63  8  170.90  154.78  8  300.00  276.23  $$\alpha_{n1}$$  220.00    $$\alpha_{n2}$$  247.00    $$\beta_{n1}$$  3.70    $$\beta_{n2}$$  4.44    Table 3 Reorganized survival data of failure 3 in plant 1 to train and test the ANN Failure  $$t^{(i)}$$  $$X_{1nk}^{(i)}$$, Operating hours  $$X_{2nk}^{(i)}$$, Ambient humidity  $$X_{3nk}^{(i)}$$, Working temp  $$X_{4nk}^{(i)}$$, Flow  Normal Ravdin ANN  $$X_{5nk}^{(i)}$$, CDF($$t^{(i)})$$  3  10  10,385  97  291.00  324.42  0  0.00  3  20  10,395  99  301.75  330.27  0  0.00  3  30  10,405  94  322.50  341.57  0  0.00  3  40  10,415  98  301.50  330.14  0  0.01  3  50  10,425  100  306.25  332.72  0  0.02  3  60  10,435  84  307.50  333.40  0  0.03  3  70  10,445  65  299.25  328.91  0  0.06  3  80  10,455  70  318.75  339.53  0  0.10  3  90  10,465  91  307.75  333.54  0  0.15  3  100  10,475  80  321.00  340.75  0  0.21  3  110  10,485  65  327.75  344.43  0  0.28  3  120  10,495  77  330.00  345.65  0  0.37  3  130  10,505  99  304.00  331.50  0  0.46  3  140  10,515  86  326.12  343.54  0  0.55  3  150  10,525  87  324.13  342.46  0  0.65  3  160  10,535  72  326.40  343.69  0  0.73  3  170  10,545  85  313.48  336.66  1  0.81  3  180  10,555  85  313.48  336.66  1  0.87  3  190  10,565  85  313.48  336.66  1  0.92  3  200  10,575  85  313.48  336.66  1  0.95  Failure  $$t^{(i)}$$  $$X_{1nk}^{(i)}$$, Operating hours  $$X_{2nk}^{(i)}$$, Ambient humidity  $$X_{3nk}^{(i)}$$, Working temp  $$X_{4nk}^{(i)}$$, Flow  Normal Ravdin ANN  $$X_{5nk}^{(i)}$$, CDF($$t^{(i)})$$  3  10  10,385  97  291.00  324.42  0  0.00  3  20  10,395  99  301.75  330.27  0  0.00  3  30  10,405  94  322.50  341.57  0  0.00  3  40  10,415  98  301.50  330.14  0  0.01  3  50  10,425  100  306.25  332.72  0  0.02  3  60  10,435  84  307.50  333.40  0  0.03  3  70  10,445  65  299.25  328.91  0  0.06  3  80  10,455  70  318.75  339.53  0  0.10  3  90  10,465  91  307.75  333.54  0  0.15  3  100  10,475  80  321.00  340.75  0  0.21  3  110  10,485  65  327.75  344.43  0  0.28  3  120  10,495  77  330.00  345.65  0  0.37  3  130  10,505  99  304.00  331.50  0  0.46  3  140  10,515  86  326.12  343.54  0  0.55  3  150  10,525  87  324.13  342.46  0  0.65  3  160  10,535  72  326.40  343.69  0  0.73  3  170  10,545  85  313.48  336.66  1  0.81  3  180  10,555  85  313.48  336.66  1  0.87  3  190  10,565  85  313.48  336.66  1  0.92  3  200  10,575  85  313.48  336.66  1  0.95  The result, the output of the ANN model is the probability of failure estimation, developed from the GRP-II model, and with covariates affection as roughly proportional to Weibull Survival probability. The ANN analysis done, going through the processes of training, predictions and test, produced the results in Table 4. Table 4 Data set of variables Variables of vector $$X_{nk}$$  Max.  Ref.  Min.  Unit  $$X_{1nk}$$, Operating hours  20,000  15,000  0  h  $$X_{2nk}$$, Flow  600  300  55  Kg/s  $$X_{3nk}$$, Ambient humidity  100  75  25  %  $$X_{4nk}$$, Temperature  400  300  290  $$^{\circ}$$C  $$X_{5nk}$$, Survival function  1  0.5  0     Variables of vector $$X_{nk}$$  Max.  Ref.  Min.  Unit  $$X_{1nk}$$, Operating hours  20,000  15,000  0  h  $$X_{2nk}$$, Flow  600  300  55  Kg/s  $$X_{3nk}$$, Ambient humidity  100  75  25  %  $$X_{4nk}$$, Temperature  400  300  290  $$^{\circ}$$C  $$X_{5nk}$$, Survival function  1  0.5  0     The learning algorithm parameters were as follows: (a) maximum number of cycles $$=$$ 1000, (b) maximum validation failures $$=$$ 40, (c) min_grad $$=$$ 1.0e$$-$$10, (d) goal $$=$$ 0, (e) $$\mu = 0.005$$, (f) $$\mu$$_dec $$=$$ 0.1, (g) $$\mu$$_inc $$=$$ 10, (h) $$\lambda = 0$$, (i) min Error $$=$$ 0.00001833. The results obtained in this case guarantee a good optimization model, as shown in Table 5. Mean Square Error (MSE), in the training and testing, validates the ANN signifying the average distance between the prediction obtained and the real production. Besides that, Table 5 shows the results of the model training process. Table 5 Results of training in developed model Results  Value  MSE training  90.02918  MSE test  335.9361  $$R^{\mathrm{2}}$$ training  0.948243  $$R^{\mathrm{2}}$$ test  0.8262953  Results  Value  MSE training  90.02918  MSE test  335.9361  $$R^{\mathrm{2}}$$ training  0.948243  $$R^{\mathrm{2}}$$ test  0.8262953  Whereas, if we had used the Ravdin and Clark model directly, the results would have been with less accuracy (as Table 6 shows). Table 6 Results of training with Ravdin and Clark Results  Value  MSE training  352.2306  MSE test  492.8387  $$R^{\mathrm{2}}$$ training  0.8590198  $$R^{\mathrm{2}}$$ test  0.7971156  Results  Value  MSE training  352.2306  MSE test  492.8387  $$R^{\mathrm{2}}$$ training  0.8590198  $$R^{\mathrm{2}}$$ test  0.7971156  In this developed model, $$R^{\mathrm{2}}$$ is consistent with this result, explaining 94.8% of the predicted model. Figures 2 and 3 are a representation of deduced predictions. Figure 2 show the training of both Ravdin and Clark ANN and Survival ANN with a dashed line. (2a) in the case of normal R&C ANN, and (2b) in the case of modified R&C ANN. Figure 3 has a straight line to indicate the best approximation for error minimization. For validation purposes, the 25% of historical data is used to estimate the generalization error. Fig. 2. View largeDownload slide (a) Normal R&C ANN training and (b) Modified R&C ANN training. In both graphs straight lines are the modelled CDF and dashed lines are the predicted CDF by the ANNs. Fig. 2. View largeDownload slide (a) Normal R&C ANN training and (b) Modified R&C ANN training. In both graphs straight lines are the modelled CDF and dashed lines are the predicted CDF by the ANNs. Fig. 3. View largeDownload slide Modified R&C ANN CDF—$$Y\prime ^{(i)}$$ predictions versus modelled CDF—$$Y^{(i)}$$. Fig. 3. View largeDownload slide Modified R&C ANN CDF—$$Y\prime ^{(i)}$$ predictions versus modelled CDF—$$Y^{(i)}$$. This case study has generated a good prediction of a real failure based on 3 year of data of it. Using less than 3 years of data is possible, but to deduce covariate relationships with degradation may be difficult and the seasonal behaviour of some of them would degenerate future predictions. Our recommendation is to employ more than 2 years in the case of environmental influence. However, the number of discretized times in other applications may be less if the covariates are more stable over time. Returning now to the preventive maintenance, our mathematical tool allows one to implement an intelligent preventive maintenance strategy. The strategy is self-adaptive to observed imperfection in repairs and the influence of selected covariates on potential failures. Finally the strategy can to trigger a preventive maintenance action according to two possible business rules: A rule based on a determined level of confidence or failure probability as general reference, proportion of CDF($$t^{(i)})$$. In our case study, the level of estimated CDF($$t^{(i)})$$ which triggers preventive maintenance is 0.6. Another rule based on risk-cost-benefit analysis. In this case, we can consider not only the failure probability, but also the cost of the possibility to reach its minimum expected value:   C(t(i))=CDF(t(i))t(i)⋅Costcorrective+[1−CDF(t(i))t(i)]⋅Costpreventive; that is, consider the risk of being preventive several steps ahead, and the risk of waiting for the failure (due to corrective unavailability). For the second business rule, the last step of the methodology is to obtain an on-line economic estimation of risk, as in Fig. 4. The idea is to determine the optimal interval between preventive actions ($$t^{(i)})$$ (Campbell & Jardine, 2001) to minimize the total expected cost of the equipment maintenance per unit time. In order to do so, the criteria governing the PM action release is determined by comparing the economic value of risk (for a specific period to be selected) of the following two maintenance strategies: Strategy 1: Doing preventive ASAP, This would restore the equipment to a certain condition minimizing the risk of a failure for certain period (in this case study this condition is reached by updating only covariate values to normal equipment operating condition values), but would cost the price of the corresponding preventive maintenance activity (in our case Cost$$_{\mathrm{preventive}}(t^{(i)}) \quad =$$ 14,500 €). This calculation is computed on-line and compared to the risk of: Strategy 2: Doing nothing. The economic value of this strategy would be calculated by only computing the on-line risk of doing only corrective maintenance when failure takes place (in this case with a higher probability than if we follow strategy 1),   Costcorrective(t(i))=8,000€+(5,822€h⋅8h)=90,440€ (where the average corrective cost of a corrective is considered to be about 8,000 € and the indirect cost 82,440 €, estimated as loss of profit 5,822 €/h with a MTTR$$=$$ 8h). Fig. 4. View largeDownload slide Risk-cost analysis based on expected costs and searching the right time to trigger preventive action. (a) Straight line is the minimum expected cost, (b) dashed line is the expected preventive cost and (c) dotted line is the expected corrective cost. Fig. 4. View largeDownload slide Risk-cost analysis based on expected costs and searching the right time to trigger preventive action. (a) Straight line is the minimum expected cost, (b) dashed line is the expected preventive cost and (c) dotted line is the expected corrective cost. When the on-line risk of doing nothing (Strategy 2) exceeds the risk of the PM activity (Strategy 1) to a certain extent, then PM maintenance is automatically released and accomplished. This risk exceeding extent is understood here as a company policy. Figure 4 shows this concept. For instance, between $$t^{(i)} = 120$$ and 140 days would be a moment in time where Strategy 2 risk increases more than the PM Strategy 1 risk, PM would then be scheduled and released. Note how decisions are therefore taken based on strategy probability risk numbers and on-line. Finally, the repercussions of the chosen prediction model have to be evaluated with a cost-benefit analysis, prior to their implementation and communication to the entire organization. The most vulnerable (and/or sensitive) points of these pumps are mechanical seals. They are responsible for preventing fluid leakage (dangerous fluid at high temperature and pressure). Thanks to this research, the associated risk to ‘damaged seals’ failure mode could be reduced by 247,319 €/plant a year, with an estimated potential impact on the life cycle of the plant (25 years) of 7.24 M€. 5. Conclusions Thermal Solar Plant managers want to ensure longer profitability periods with more reliable plants. To ensure profitability during the life cycle of the plant we must ensure critical equipment reliability and maximum extension of their life cycle, otherwise failure costs will penalize the expected profit. Throughout this document, we suggest applying an ANN model per failure mode and we foster a practical implementation in SCADA systems for different plants. This methodology may ease and may improve decision-making and risk modelling, enabling reductions in corrective maintenance direct and indirect costs or allowing the display of residual life until total equipment failure. In cases when enough data for significant training is available, a better implementation of our methodology will help to reduce the costs and will improve the knowledge of the life cycle of the plant when suffering non-homogeneous operational and environmental conditions. The capacity of an ANN for self-learning among sources of data (sometimes noised or deprived of communication) thanks to reiterative memory is important. In our case study, we had a vast quantity of data, although sometimes this data was affected by problems of sensor readings or communications. Back-propagation perceptron ANN is recommend for automation developments with real-time utilization. Furthermore, advanced ANN models could be applied when supporting additional variables. Acknowledgements The authors would like to acknowledge the support of the Scientific Chair of MM BinLadin for Operation and Maintenance Technology at Taibah University, Madina, Saudi Arabia. Funding SMARTSOLAR Project (OPN—INNPACTO-Ref IPT-2011-1282-920000). Reference Biganzoli, E., Boracchi, P., Mariani, L. & Marubini, E. ( 1998) Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat. Med. , 17, 1169– 1186. Google Scholar CrossRef Search ADS PubMed  Campbell, J. D. & Jardine, A. K. ( 2001) Maintenance Excellence: Optimizing Equipment Life-Cycle Decisions . New York: M. Dekker. ISBN 9781420029406. Caner, M., Gedik, E. & Keçebaş, A. ( 2011) Investigation on thermal performance calculation of two type solar air collectors using artificial neural network. Expert Syst. Appl. , 38, 1668– 1674. Google Scholar CrossRef Search ADS   Cohen, P. R. & Feigenbaum, E. A. (eds). ( 2014) The Handbook of Artificial Intelligence , vol. 3. Oxford: Butterworth-Heinemann. ISBN 9781483214399. Cox, P. R. ( 1972) Regression Models and Life-Tables. J. R. Stat. Soc. Series B (Methodological) , London: Wiley, 34, 187– 220. Google Scholar CrossRef Search ADS   Cox, D. R. & Oakes, D. ( 1984) Analysis of Survival Data , Monographs on Statistics & Applied Probability. vol. 21. United States: Chapman & Hall, CRC Press. ISBN 9780412244902. Curry, B., Morgan, P. & Beynon, M. ( 2000). Neural networks and flexible approximations. IMA J. Manag. Math. , 11, 19– 35. Google Scholar CrossRef Search ADS   Dagpunar, J. S. ( 1997). Renewal-type equations for a general repair process. Qual. Reliab. Eng. Int. , 13, 235– 245. Google Scholar CrossRef Search ADS   Ennis, T. ( 2009). Safety in design of thermal fluid heat transfer systems. Symposium series  (Rugby ed.). Hazards XXI: process safety and environmental protection von Institution of Chemical Engineers, vol. 155. 162– 169. Faraggi, D. & Simon, R. ( 1995). A neural network model for survival data. Stat. Med. , 14, 73– 82. Google Scholar CrossRef Search ADS PubMed  González-Prida, V.,, Barberá, L., Márquez, A. C. & Fernández, J. G. ( 2014). Modelling the repair warranty of an industrial asset using a non-homogeneous Poisson process and a general renewal process. IMA J. Manag. Math. , 26, 171– 183. doi: 10.1093/imaman/dpu002. Google Scholar CrossRef Search ADS   Hougaard, P. ( 2012). Analysis of multivariate survival data . New York: Springer Science and Business Media. Kalogirou, S. ( 2007). Artificial intelligence in energy and renewable energy systems. New York: Nova Publishers. Kuo, C. ( 2011). Cost efficiency estimations and the equity returns for the US public solar energy firms in 1990–2008. IMA J. Manag. Math. , 22, 307– 321. Google Scholar CrossRef Search ADS   Lapedes, A. & Farber, R. ( 1987). Nonlinear signal processing using neural networks: prediction and system modelling. IEEE international conference on neural networks, San Diego, CA, USA (No. LA-UR-87-2662; CONF-8706130-4) . Lawrence, S., Giles, C. L. & Tsoi, A. C. ( 1998). What size neural network gives optimal generalization? Convergence properties of backpropagation. Lee, E. T. & Wang, J. ( 2003). Statistical methods for survival data analysis, vol. 476. New Jersey: Wiley. Liestbl, K., Andersen, P. K. & Andersen, U. ( 1994). Survival analysis and neural nets. Stat. Med. , 13, 1189– 1200. Google Scholar CrossRef Search ADS   Malcolm, B., Bruce, C. & Morgan, P. ( 1999). Neural networks and finite-order approximations. IMA J. Manag. Math. , 10, 225– 244. Google Scholar CrossRef Search ADS   Martín, L., Zarzalejo, L. F., Polo, J. Navarro, A., Marchante, R. & Cony, M. ( 2010) Prediction of global solar irradiance based on time series analysis: application to solar thermal power plants energy production planning. Sol. Energ. , 84, 1772– 1781. Google Scholar CrossRef Search ADS   Mellit, A., Benghanem, M., Arab, A. H. Guessoum, A. ( 2005). An adaptive artificial neural network model for sizing stand-alone photovoltaic systems: application for isolated sites in Algeria. Renew. Energy , 30, 1501– 1524. Google Scholar CrossRef Search ADS   Ohno-Machado, L. ( 2001). Modeling medical prognosis: survival analysis techniques. J. Biomed. Inform. , 34, 428– 439. Google Scholar CrossRef Search ADS PubMed  Ravdin, P. M. & Clark, G. M. ( 1992). A practical application of neural network analysis for predicting outcome of individual breast cancer patients. Breast Cancer Res. Treat. , 22, 285– 293. Google Scholar CrossRef Search ADS PubMed  Rosenblatt, F. ( 1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. , 65, 386. Google Scholar CrossRef Search ADS PubMed  Smith, P. J. ( 2002). Analysis of failure and survival data. United States: Chapman & Hall, CRC Press. ISBN 9781584880752. Xiang, A., Lapuerta, P., Ryutov, A. Buckley, J. & Azen, S. ( 2000). Comparison of the performance of neural network methods and Cox regression for censored survival data. Comput. Stat. Data Anal. , 34, 243– 257. Google Scholar CrossRef Search ADS   © The authors 2016. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Management Mathematics Oxford University Press

# Analysis of dynamic reliability surveillance: a case study

, Volume 29 (1) – Jan 1, 2018
15 pages

/lp/ou_press/analysis-of-dynamic-reliability-surveillance-a-case-study-jwsV00R6F1
Publisher
Oxford University Press
ISSN
1471-678X
eISSN
1471-6798
D.O.I.
10.1093/imaman/dpw011
Publisher site
See Article on Publisher Site

### Abstract

Abstract In this paper a reliability model based on artificial neural networks and the generalized renewal process is developed. The model is used for failure prediction, and is able to dynamically adapt to changes in the operating and environmental conditions of assets. The model is implemented for a thermal solar power plant, focusing on critical elements of these plants: heat transfer fluid pumps. We affirm that this type of model can be easily automated within the plant’s remote monitoring system. Using this model we can dynamically assign reference values for warnings and alarms and provide predictions of asset degradation. These in turn can be used to evaluate the associated economic risk to the system under existing operating conditions and to inform preventive maintenance activities. 1. Introduction The consistency of reliability predictions in the renewable energy sector is important because of the high impact of production losses. These predictions are complex to generate, due to the fact that assets’ operating conditions change seasonally and geographically. In this sector, the optimization of maintenance programs for an asset must consider operational variables (configurations, preventive maintenance, undue handling, etc.) as well as environmental conditions (cleanliness, fastening, temperature, etc.). Then, reliability estimations that take these contributing factors into account may be informative. Besides this, the need to update these estimations over time, and the proper consideration of explanatory variables or covariates, is critical to predict time to system failure. There are many techniques for survival analysis and estimation (Cox & Oakes, 1984, Smith, 2002) that use explanatory variables. These techniques can be parametric when the failure distributions are known, semi-parametric in the case of unknown failure distribution but with defined assumptions of proportionality with time covariates (independent among them), or non-parametric when the failure distributions are not specified (Lee & Wang, 2003, Hougaard, 2012). Flexibility and complexity of computational implementation increase from parametric to non-parametric methods (Ohno-Machado, 2001). In the search for efficiency, artificial intelligence (AI) can be used to solve, computationally, these prediction problems (Kalogirou, 2007; Cohen & Feigenbaum, 2014), and well-accepted and recommended methods for these purposes are artificial neural network (ANN) models (Mellit et al., 2005; Martín et al., 2010; Caner et al., 2011; Kuo, 2011). These networks are composed of multiple, connected units (neurons). The standard ANN architecture is one input layer, one output layer and generally, one or more hidden layers. To achieve auto-adjustment, an often used ANN is the backpropagation neural network (BANN) which prevents overtraining. This technique filters the noise and recognizes the most overt and accessible patterns, overcoming in an order of magnitude the linear conventional methods and the polynomial methods (Lapedes & Farber, 1987). Consequently, BANN provides a strong tolerance to noisy data, due to the storage of redundant information, and adaptation in the presence of explicit knowledge for the resolution of problems (Malcolm et al., 1999; Curry et al., 2000). In our proposed prediction model, the neural network model under consideration is a feed-forward single layer perceptron (Rosenblatt, 1958), which is composed of one input layer with $$P$$ neurons, an intermediate or hidden layer with $$M$$ neurons, and an output layer with 1 neuron, $$Y$$. The neural network output, $$Y$$, is a function based on a linear combination of a set of weights with the outputs from the intermediate layer neurons. All the weights used in the linear combination are learned by a back-propagation algorithm (Lawrence et al., 1998), in which the sum-of-squared errors   R2=∑i=1NRi2=∑i=1N(Y(i)−f(X(i)))2 (1) is optimized, reaching a global minimum by a loop with a maximum of $$S$$ steps, where in each s-step a couple of forward and backward actions are executed in the $$N$$-elements training set $$\thinspace \mathrm{TS}=\{\left( X^{\left( i \right)},Y^{\left( i \right)} \right)\vert i\in \left\{ 1,... ,N \right\}$$, where $$X^{(i)}=\left( X_{1}^{(i)},X_{2}^{(i)},... ,X_{P}^{(i)} \right)$$ and where $$Y^{(i)}=\left( Y_{1}^{(i)},Y_{2}^{(i)},... ,Y_{P}^{(i)} \right)$$ are an input and output values respectively from the training set. This paper is fundamentally concerned with the utilization of a BANN to obtain predictions about failures before they occur. For this purpose, the paper is structured in three main parts: a first part, Section 2, which includes the development of our prediction model; then, a second part including Sections 3 and 4 which include a case study; finally a section that is dedicated to the conclusions of the paper. 2. Proposed prediction model In this section we assume that there are sufficient experimental data about failures. Due to their experimental nature, these raw data do not follow a formal parametric failure distribution function, and in addition these have non-linear correlations with the covariates that determine accelerating process in the failure appearance. In order to simplify, we assume these covariates are independent among them and with no -time-dependency. In these cases, the maintenance decision making for renewal energy equipments, under different operating environments, can be supported by ANN, with the following benefits: Suitable to the managed amount of failure data, the more information you provide the more relevant your result becomes as continuous improvement, Implementable in remote monitoring systems, searching automation not only for prediction but also for self-adjusting based on real data, Flexible to changes, either for the elimination of any covariate or for the incorporation of new covariates in the input layer or by combining several ANN hierarchically. Numerous papers (Ravdin & Clark, 1992; Liestbl et al., 1994; Faraggi & Simon, 1995; Biganzoli et al., 1998; Xiang et al., 2000; Ohno-Machado, 2001) present the comparison between several existing methods to fit survival functions showing relations between the reliability and the covariates; for instance, comparing the widely applied semi-parametric Cox’s proportional hazard model (PHM) (Cox, 1972) versus several ANN models. When the model complexity is low, based on a few covariates and with proportional relations with the reliability, there are no significant differences between predictions of Cox regression and ANN models. In the case of complex models with many covariates and with any interaction term, predictions of ANN models have important advantages compared to Cox regression models. In practice, depending on the way that covariates are considered, different ANN models can be built; for instance: ANN can be used instead of the linear combination of weight coefficients in the Cox PHM, as in Faraggi & Simon (1995). In this case it is necessary to solve the PHM using the Partial Maximum Likelihood Estimation (P-MLE). Using an input with the survival status over disjoint time intervals where the covariate values are replicated. A binary variable can be used with value 0 before the interval of the failure and 1 in the event of failure or later, as in Liestbl et al. (1994). In this case, each time interval is an input with a survival status, then a vector of survival status is defined per failure; or Employing the Kaplan–Meier (K–M) estimator to define the time intervals as two additional inputs instead of a vector; one is the sequence of the time intervals defined by the K–M, and the other is the survival status at each time of the sequence. This is the case of Ravdin & Clark (1992) or Biganzoli et al. (1998), models that are known as proportional K–M. Since the aim of this research is to improve results of parametric methods, combining them with the self-adaptive property of ANN, the proposed survival ANN model is based on the ideas of Ravdin & Clark (1992). At the same time some mathematical modifications are introduced for simplification and in order to facilitate reliability surveillance (see Table 1). Considering these modifications the application of the proposed model is sequenced in two phases: In a first phase, for easy understanding of results and for applying another reliability analysis over time to failures. The re-utilization of parametric method outcomes is recommended, as a previous estimation and without the consideration of covariates. Because of this, we will have a parametric estimation of the survival curve. Instead of using the K–M estimator, the General Renewal Weibull Process II (GRP-II) method is selected in order to fit the curve better and to reduce the negative effect of a non-monotonically decreasing survival curve. In our study, we also propose to combine ANN with the GRP-II parametric model, which evaluates survival probability in repairable and non-repairable systems, and models the repair efficiency estimation in avoiding the overall damage produced due to all the successive failures (González-Prida et al., 2014). The GRP-II model has more accuracy than the GRP-I for complex systems, or with data from multiple devices of the same type (Dagpunar, 1997). GRP-II gives three Weibull parameters, $$\alpha$$-scale, $$\beta$$-shape and $$q$$-repair efficiency. In the $$q$$ parameter, the recurrence rate of failures is captured showing the effects of the repairs on the age of that system $$n$$ (modelling partial renewal repair). Properly performed repairs (0< $$q_{n}$$ < 1) may improve system virtual states (life), while poorly solved failures ($$q_{n}$$1) could aggravate it (always speaking in terms of reliability). Then, the virtual age of the system $$i$$ is updated after a failure $$j$$ according to the following equation 2 (where $$T_{nj}$$ are the time intervals between successive failures) and the failure probability distribution conditioned to the survival new virtual age is calculate in equation 3:   Vnnew=qn⋅(Vnold+Tnj) (2)  F(t|Vnnew)=P[Tnj⩽t|Tnj>Vnnew]=F(t)−F(Vnnew)1−F(Vnnew) (3) Only based on the times to failure ($$T_{nj})$$, GRP-II is applied over $$k$$ groups of failures with similar covariate values, for example for each plant. Then, Weibull parameters $$\alpha$$, $$\beta$$ and $$q$$ are obtained for each group, and without using the covariates, only based on time to failures.   F(t|Vnknew)=exp[(Vnnewαnk)βnk−(tαnk)βnk] (4) After that, in order to adapt the parametric estimation of the survival function according to covariates, the back-propagation ANN is utilized with the following criteria:   A sigmoid (logistic) function f(x)=ex(1+ex), (5) over normalized variables between 0 and 1, is used for hidden layer neurons and a linear combination of the latter, for determining the neural network output. It is not recommendable to use more than two times the number of input neurons in the hidden layer (Lawrence et al., 1998). Discretized times inside the intervals $$T_{nkj}$$ over $$k$$ groups of failures with similar covariate values $$X_{nk}$$ are defined according to the available and representative information for the selected failure. In order to be upgradeable iteratively and with real data and for all the intervals, the real covariate values will be used as inputs $$X_{nk}^{(i)}$$, but only up to the time the failure occurs, after which the average covariate value is taken $$X_{nk}^{(i)}=\dot{X}_{nk}$$. In order to homogenize the selected intervals $$T_{nkj}$$, all of them are extended after the failure time in discretized times $$t^{(i)}$$ of the training set up to the maximum length of the intervals (which corresponds to the maximum time to failure), resulting all ($$\dot{T}_{nkj})$$ with the same length.   TSnk={(Xnk(i),Ynk(i))|i∈{1,…,N}} (6) The additional input of the survival status, in our model, has a gradual increment from 0 to 1 in the failure event or after each specific time to failure. To do this, the Weibull Cumulative Distribution Function (CDF) can be used for all intervals, with the previously obtained $$\beta_{nk}$$ per group by GRP-II method, searching to maintain the shape of predictions. Besides, in order to obtain proportionality, the gradual increment from 0 to 1 of the survival status, the Weibull-scale $$\alpha_{nk}$$ in each interval is adapted to the specific time interval $$T_{nkj}$$ (between successive failures) pondered by the Median Life. Therefore, the output layer contains a single output neuron corresponding to the estimated survival status (probability to failure) that ascends from 0 to 1 until the time to failure and after, see Equation 7, where for all the intervals: $$\beta_{\mathrm{GRP}}$$ is obtained by GRP-II for each group of times to failures, and $${\alpha \mathrm{ }}_{nkj}\mathrm{=}{T\mathrm{ }}_{nkj}\mathrm{\cdot }{\mathrm{Ln(2)}}^{1/{\mathrm{\beta }}_{nk}}$$:   CDF(t(i))=1−[1/exp(t(i)αnkj)βnk]=1−[1/exp(t(i)(Tnkj⋅Ln(2)1/βnk))βnk] (7) The training of the network, which gives us the network settings, is carried out based on 75% of available data; and the other 25% is used for the network testing in order to subsequently validate the behaviour pattern. The learning backpropagation algorithm used is a supervised error correction, minimizing the penalized mean square error through the Quasi-Newton method in the free software R.   RMSE=∑i=1N(Y(i)−f(X(i)))2N (8) Table 1 Training set of normal and modified R&C ANN       Normal R&C ANN  Modified R&C ANN     $$t^{(i)}$$  $$X_{1n1}$$  $$X_{1n1}$$  Survival status  $$X_{1n1}$$  $$X_{1n1}$$  Survival status, CDF($$t^{(i)})$$  Pump$$_{\mathrm{11}}$$$$\dot{T}_{\mathrm{111}}$$  1  1  1  0  0.7  0.6  0.27     2  1  1  0  0.8  0.7  0.47     3  1  1  0  0.7  0.8  0.62     4  1  1  0  0.9  0.8  0.72     5  1  1  0  0.8  0.8  0.80     6  1  1  1  0.8  0.9  0.85     7  —  —  —  0.8  0.8  0.89     8  —  —  —  0.8  0.8  0.92  Pump$$_{\mathrm{21}}$$$$\dot{T}_{\mathrm{211}}$$  1  1  0  0  0.8  0.2  0.21     2  1  0  0  0.7  0.1  0.38     3  1  0  0  0.9  0.3  0.51     4  1  0  0  0.6  0.1  0.62     5  1  0  0  0.8  0  0.70     6  1  0  0  0.7  0.1  0.76     7  1  0  0  0.8  0  0.81     8  1  0  1  0.9  0.2  0.85        Normal R&C ANN  Modified R&C ANN     $$t^{(i)}$$  $$X_{1n1}$$  $$X_{1n1}$$  Survival status  $$X_{1n1}$$  $$X_{1n1}$$  Survival status, CDF($$t^{(i)})$$  Pump$$_{\mathrm{11}}$$$$\dot{T}_{\mathrm{111}}$$  1  1  1  0  0.7  0.6  0.27     2  1  1  0  0.8  0.7  0.47     3  1  1  0  0.7  0.8  0.62     4  1  1  0  0.9  0.8  0.72     5  1  1  0  0.8  0.8  0.80     6  1  1  1  0.8  0.9  0.85     7  —  —  —  0.8  0.8  0.89     8  —  —  —  0.8  0.8  0.92  Pump$$_{\mathrm{21}}$$$$\dot{T}_{\mathrm{211}}$$  1  1  0  0  0.8  0.2  0.21     2  1  0  0  0.7  0.1  0.38     3  1  0  0  0.9  0.3  0.51     4  1  0  0  0.6  0.1  0.62     5  1  0  0  0.8  0  0.70     6  1  0  0  0.7  0.1  0.76     7  1  0  0  0.8  0  0.81     8  1  0  1  0.9  0.2  0.85  As a result, the modification of the Ravdin and Clark type of ANN is shown as an example in Table 1 in comparison with the normal model for two pumps (1 and 2) of the same group (1) of similar covariates and for two homogenized time intervals of failures $$\dot{T}_{nkj}$$(one of each pump). In this example, discretized times $$t^{(i)}$$ are shown in the intervals $$\dot{T}_{\mathrm{111}}$$ and $$\dot{T}_{\mathrm{211}}$$ are shown jointly with the covariate values. Thus, Ravdin and Clark’s ANN (R&C ANN) was trained with equal covariate value for all the discretized times in each interval, for example $$X_{nk}^{(i)}=\dot{X}_{nk}$$, and it was interrupted at each specific time to failure $$T_{nkj}$$. While in the modified R&C ANN real data covariates are used $$X_{nk}^{(i)}$$ and the their average value $$\dot{T}_{nk}$$ after the $$T_{nkj}$$ time to failure, but in this case for each homogenized time to failure $$\dot{T}_{nkj}$$. The original ANN uses a binary system 0 or 1 as survival status and the modified ANN uses estimated probability of failure from a Weibull according to the equation 7 (with $${\alpha \mathrm{ }}_{111}\mathrm{=6\cdot }{\mathrm{Ln(2)}}^{1/1.5}$$ and $${\alpha \mathrm{\alpha }}_{211}\mathrm{=8\cdot }{\mathrm{Ln(2)}}^{1/1.5}$$. This modification, based on the combination of parametric and AI methods, aims to show how existing information and analysis in the plants, jointly with a monitoring system, may be used to improve decisions, mixing off-line statistical models with on-line real data from remote monitoring systems. As any ANN, the main weak point in this model is the necessity to adjust the covariate values according to their representative influence in the selected failure in discrete times. That is, depending on their influence in failure degradation with time, it avoids random and bad-acquired values but keeps the right data seasonality. Besides this, normalization among different geographical locations is required in order to replicate the analysis. However, once the analysis is accomplished, the developed model can be applied easily in a remote monitoring system, requiring only a model refining each 2 or 3 years, or may be when operating circumstances change radically. A set of alarms for observed abnormal tendencies may also be implemented (i.e. for $$q$$-repair efficiency, warning about tendency of successive repairs to the system status conservation). This could also be used as a warning about the lack of model consistency, i.e. about the need to restart the analysis with new $$T_{nkj}$$ intervals to capture new repair-stages. In the sequel, the proposed model will now be built and tested in a case study. The idea is also to implement it in a remote monitoring system. 3. Case study. A thermal solar plant Thermal solar plants have been in production for more than 25 years. Current decrease in government incentives for renewable energy sources has forced companies to study useful life extension possibilities. Due to this, potential plant re-investments must also be re-evaluated; incorporating future operating and environmental conditions within equipment reliability analysis. In these plants the combination of mechanical and thermal stresses makes reliability analysis important. This is not only because the direct costs of failures, but also due to their significant indirect loss of profit, as well as the associated environmental and safety risks (Ennis, 2009). By developing a model for failure prediction we can avoid these risks. This model will be applicable to each critical failure mode, because symptoms and causes may be dissimilar among them and the effect of equipment conditions may apply in a different manner. Understanding the previous point is important; efforts in failure mode analysis will be intense but worthwhile. For instance, defining suitable covariates per failure mode, could add enormous value to protecting our assets and their contribution to the business. This type of thermal solar plant is usually built modularly; therefore the possibility to replicate the same model for different modules and regions is also considered of great interest. With that in mind, we have tried to develop our ANN model, which is easy to reproduce, and to update it with the most common parameters found in this type of plant. The solar thermal power plant under consideration has a nominal power of 49.9 MW with an annual production of 180 million KWh and occupies an extension of 2,700,000 square meters. It is located in the southern part of Spain and it will supply energy to more than 100,000 dwellings for an operational time of 25 years. Even in the absence of enough solar radiation, its storage subsystem is able to provide energy for 7.5 h. All the energy produced (180 million KWh/year) is provided by the distributor for 51 million €/year of production (at an initial price of 0.2849 €/KWh, and subsequently reduced due to a legal requirement). The main subsystems of this kind of power plant are, see Fig. 1: Fig. 1. View largeDownload slide Thermal solar plant and functional description. Fig. 1. View largeDownload slide Thermal solar plant and functional description. Solar field. It is composed of 8,064 parabolic trough solar collectors with 225,792 mirrors. It heats the high-temperature oil circulated in the HTF loop. HTF loop, which conveys high-temperature fluid to the heat exchanger in the steam generator. Water loop, where steam flow is condensed and cooled and recirculated as a water flow to the steam generator. Steam generator, which transforms water into steam to activate the turbine. Turbine, which transforms the mechanical energy of the steam flow into electrical power. In our case study, we have selected a common system to illustrate the model implementation over real data and in a remote monitoring system: electrical pumps in HTF loops. They consist of several large pumps in charge of making the thermal oil (Dowtherm A) to flow throughout the plant. Usually, the tendency is to view HTF pumps in a Thermal Solar Plant as a low potentially hazardous process despite being a heat transfer and under pressure systems that could produce fire and explosion hazards, where leaks can produce a potentially flammable mist or contamination (Ennis, 2009). Moreover, their operation is critical in order to keep the desired availability of the power plant, so all the production is supported by a 2$$+$$1 HTF pumps configuration (two of them working in parallel and the other is a spare one). Both active pumps jointly contribute 50% of HTF recirculation oil and a pump failure could reach up to 50% of daily production. Therefore, HTF pumps need surveillance to ensure their efficiency and to control their deterioration. In order to optimize plant efficiency, the remote monitoring system would set the HTF pump speed through changes in the variable frequency drives, according to different temperatures, the direct beam irradiance, pressures, and also the potential fluid density. For example, a bigger difference between inlet-outlet HTF temperatures requires less impelled flow. For the purpose of this paper, we have selected the failure mode: ‘damaged mechanical seal’, causing significant production losses. This failure mode emerges due to many factors, such as: high seal operating temperature, excessive pump vibration by cavitation, parts misalignment, etc. This problem increases during the summer period when pumps run at full load, at which time production losses are the highest. Potential mechanical seal failures are predicted using our developed back-propagation neural network, equipped with the last 3 years pumps’ historical data. We focus our attention on failures resulting as a consequence of equipment deterioration due to operational and geographical (environmental) features that could have a great impact on equipment conditions. For instance, we know that extreme fluctuant cycles of inlet–outlet pressure and high temperatures can degrade the oil, producing contamination and corrosion; contraction–expansion may also result in misalignments. For this case, predicting the problem in real-time, using process-control variables and with transfer function (thermodynamic approach) was found to be impossible. So, all representative contributions to pump degradation are compound in a single (survival) function which reflects the probability of the failure mode. In this document we show the aptitudes of an ANN to replicate self-adaptive reality by fitting a survival function. This is done in complex and noisy operating conditions. Our prediction models have innovative features compared to previous works in the literature. The ANN models not only use parametric estimations about the failure times, but also environment variables, such as external humidity, and also assets’ condition variables, such as working temperature and different operating times and cycles. In addition, parametric methods are combined with ANN in order to develop a stable model which will be easily and quickly implementable in a remote monitoring system. Through this, an early detection of degradation will be possible before failures affect production, people or the environment, and a quantitative measure of risk can be computed as a percentage. 4. Developing and implementing the ANN model In order to approach the problem of real time condition estimation that could lead to early warnings for the failure mode, the modified ANN architecture is developed based on selected variables from those whose detection is periodically and automatically feasible with our remote monitoring system and they are the most representative showing their effects in the damaged seal of HTF pumps. Specific information about the developed process is as follows: We selected two plants with eight failures each one. The remote monitoring variables for the input layer were: ○ Flow on HTF (l/s). ○ Working temperature on HTF (°C). ○ Ambient humidity (%). ○ The operation time of the pumps (days). ○ The modelled survival status. ○ The threshold neuron. ○ Periods for comparison were selected, to detect the existence of the failure mode and the most representative variables. ○ The data was reorganized, eliminating abnormal data that could distort the results. Values were normalized and with the same scale for all the input values to simplify calculations and analysis. Later the normalized values have to be de-normalized before comparison. ○ A single hidden layer with nine neurons is used (less than two times the number of the input neurons). In summary, the implementation of our proposed model is based on the two phases: The estimation, in a first step, of the survival function with a parametric GRP-II Weibull over two groups of the produced time to failures where the covariates are the same, one over the eight failures of plant 1 and the other over the eight failures of plant 2. Then, we obtain a characteristic $$\alpha$$, $$\beta$$ and $$q$$ for each plant, only based on time to failures (no covariates are used at this level) as shown in Table 2. We take the Weibull CDF, maintaining the $$\beta_{nk}$$ in each time interval in each plant (for each group), and taking the $${\alpha \mathrm{ }}_{nkj}\mathrm{=}{T\mathrm{ }}_{nkj}\mathrm{\cdot }{\mathrm{Ln(2)}}^{1/{\mathrm{\beta }}_{nk}}$$. As a result for each specific failure, the probabilities to failure ascend from 0 up to 1, next to the maximum failure time. In Table 2, the 16 failures with their time to failure ($$T_{nkj})$$ and the modified $$\alpha_{nkj}$$ with the ponderation are shown for each plant. The modelling, in a second step, of the survival function for each specific failure with adaptation of the parametric estimation according to covariates. For this purpose we have based it on the modified R&C ANN. The discretized time $$t^{(i)}$$ inside the intervals is selected according to the covariates influence in the degradation of the failure mode. Therefore all the inputs (covariates and CDF) are redefined with this period (notice that this requires the replication of covariates after each specific time to failure with their average value). In our example, for a population of six pumps (three per plant), and sixteen registered failures in three years, after filtering and reorganization, 914 discretized times are trained in the survival ANN model. Consequently, the data to train and test the GRP–ANN are reorganized as below and illustrated in Table 3 for plant 1 and failure number 3 ($$\alpha_{nkj}= 148.33$$, $$\beta_{nk}=$$ 3.7). That is, if $$\left[ 0\leqslant t^{(i)}\leqslant T_{nkj} \right]$$ then   Xnk(i)=real value of vector Xnkin each t(i)CDF(t(i))=1−[1/exp(t(i)(Tnkj⋅Ln(2)1βnk))βnk] and if $$\left[ 0\leqslant t^{(i)}\leqslant T_{nkj} \right]$$ then   Xnk(i)=X˙nk of previous t(i)toTnkjCDF(t(i))=1−[1/exp(t(i)(Tnkj⋅Ln(2)1/βnk))βnk]. Table 2 GRP Weibull parameters and reorganization for parametric estimation of survival function Plant 1  Plant 2  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  1  299.53  271.28  1  181.78  167.38  2  277.76  251.56  2  170.53  157.02  3  163.78  148.33  3  288.00  265.18  4  176.22  159.60  4  128.03  117.89  5  149.21  135.14  5  277.89  255.87  6  214.71  194.46  6  256.00  235.72  7  136.59  123.71  7  194.00  178.63  8  170.90  154.78  8  300.00  276.23  $$\alpha_{n1}$$  220.00    $$\alpha_{n2}$$  247.00    $$\beta_{n1}$$  3.70    $$\beta_{n2}$$  4.44    Plant 1  Plant 2  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  $$j$$  $$T_{nkj}$$  $$\alpha_{nkj}$$  1  299.53  271.28  1  181.78  167.38  2  277.76  251.56  2  170.53  157.02  3  163.78  148.33  3  288.00  265.18  4  176.22  159.60  4  128.03  117.89  5  149.21  135.14  5  277.89  255.87  6  214.71  194.46  6  256.00  235.72  7  136.59  123.71  7  194.00  178.63  8  170.90  154.78  8  300.00  276.23  $$\alpha_{n1}$$  220.00    $$\alpha_{n2}$$  247.00    $$\beta_{n1}$$  3.70    $$\beta_{n2}$$  4.44    Table 3 Reorganized survival data of failure 3 in plant 1 to train and test the ANN Failure  $$t^{(i)}$$  $$X_{1nk}^{(i)}$$, Operating hours  $$X_{2nk}^{(i)}$$, Ambient humidity  $$X_{3nk}^{(i)}$$, Working temp  $$X_{4nk}^{(i)}$$, Flow  Normal Ravdin ANN  $$X_{5nk}^{(i)}$$, CDF($$t^{(i)})$$  3  10  10,385  97  291.00  324.42  0  0.00  3  20  10,395  99  301.75  330.27  0  0.00  3  30  10,405  94  322.50  341.57  0  0.00  3  40  10,415  98  301.50  330.14  0  0.01  3  50  10,425  100  306.25  332.72  0  0.02  3  60  10,435  84  307.50  333.40  0  0.03  3  70  10,445  65  299.25  328.91  0  0.06  3  80  10,455  70  318.75  339.53  0  0.10  3  90  10,465  91  307.75  333.54  0  0.15  3  100  10,475  80  321.00  340.75  0  0.21  3  110  10,485  65  327.75  344.43  0  0.28  3  120  10,495  77  330.00  345.65  0  0.37  3  130  10,505  99  304.00  331.50  0  0.46  3  140  10,515  86  326.12  343.54  0  0.55  3  150  10,525  87  324.13  342.46  0  0.65  3  160  10,535  72  326.40  343.69  0  0.73  3  170  10,545  85  313.48  336.66  1  0.81  3  180  10,555  85  313.48  336.66  1  0.87  3  190  10,565  85  313.48  336.66  1  0.92  3  200  10,575  85  313.48  336.66  1  0.95  Failure  $$t^{(i)}$$  $$X_{1nk}^{(i)}$$, Operating hours  $$X_{2nk}^{(i)}$$, Ambient humidity  $$X_{3nk}^{(i)}$$, Working temp  $$X_{4nk}^{(i)}$$, Flow  Normal Ravdin ANN  $$X_{5nk}^{(i)}$$, CDF($$t^{(i)})$$  3  10  10,385  97  291.00  324.42  0  0.00  3  20  10,395  99  301.75  330.27  0  0.00  3  30  10,405  94  322.50  341.57  0  0.00  3  40  10,415  98  301.50  330.14  0  0.01  3  50  10,425  100  306.25  332.72  0  0.02  3  60  10,435  84  307.50  333.40  0  0.03  3  70  10,445  65  299.25  328.91  0  0.06  3  80  10,455  70  318.75  339.53  0  0.10  3  90  10,465  91  307.75  333.54  0  0.15  3  100  10,475  80  321.00  340.75  0  0.21  3  110  10,485  65  327.75  344.43  0  0.28  3  120  10,495  77  330.00  345.65  0  0.37  3  130  10,505  99  304.00  331.50  0  0.46  3  140  10,515  86  326.12  343.54  0  0.55  3  150  10,525  87  324.13  342.46  0  0.65  3  160  10,535  72  326.40  343.69  0  0.73  3  170  10,545  85  313.48  336.66  1  0.81  3  180  10,555  85  313.48  336.66  1  0.87  3  190  10,565  85  313.48  336.66  1  0.92  3  200  10,575  85  313.48  336.66  1  0.95  The result, the output of the ANN model is the probability of failure estimation, developed from the GRP-II model, and with covariates affection as roughly proportional to Weibull Survival probability. The ANN analysis done, going through the processes of training, predictions and test, produced the results in Table 4. Table 4 Data set of variables Variables of vector $$X_{nk}$$  Max.  Ref.  Min.  Unit  $$X_{1nk}$$, Operating hours  20,000  15,000  0  h  $$X_{2nk}$$, Flow  600  300  55  Kg/s  $$X_{3nk}$$, Ambient humidity  100  75  25  %  $$X_{4nk}$$, Temperature  400  300  290  $$^{\circ}$$C  $$X_{5nk}$$, Survival function  1  0.5  0     Variables of vector $$X_{nk}$$  Max.  Ref.  Min.  Unit  $$X_{1nk}$$, Operating hours  20,000  15,000  0  h  $$X_{2nk}$$, Flow  600  300  55  Kg/s  $$X_{3nk}$$, Ambient humidity  100  75  25  %  $$X_{4nk}$$, Temperature  400  300  290  $$^{\circ}$$C  $$X_{5nk}$$, Survival function  1  0.5  0     The learning algorithm parameters were as follows: (a) maximum number of cycles $$=$$ 1000, (b) maximum validation failures $$=$$ 40, (c) min_grad $$=$$ 1.0e$$-$$10, (d) goal $$=$$ 0, (e) $$\mu = 0.005$$, (f) $$\mu$$_dec $$=$$ 0.1, (g) $$\mu$$_inc $$=$$ 10, (h) $$\lambda = 0$$, (i) min Error $$=$$ 0.00001833. The results obtained in this case guarantee a good optimization model, as shown in Table 5. Mean Square Error (MSE), in the training and testing, validates the ANN signifying the average distance between the prediction obtained and the real production. Besides that, Table 5 shows the results of the model training process. Table 5 Results of training in developed model Results  Value  MSE training  90.02918  MSE test  335.9361  $$R^{\mathrm{2}}$$ training  0.948243  $$R^{\mathrm{2}}$$ test  0.8262953  Results  Value  MSE training  90.02918  MSE test  335.9361  $$R^{\mathrm{2}}$$ training  0.948243  $$R^{\mathrm{2}}$$ test  0.8262953  Whereas, if we had used the Ravdin and Clark model directly, the results would have been with less accuracy (as Table 6 shows). Table 6 Results of training with Ravdin and Clark Results  Value  MSE training  352.2306  MSE test  492.8387  $$R^{\mathrm{2}}$$ training  0.8590198  $$R^{\mathrm{2}}$$ test  0.7971156  Results  Value  MSE training  352.2306  MSE test  492.8387  $$R^{\mathrm{2}}$$ training  0.8590198  $$R^{\mathrm{2}}$$ test  0.7971156  In this developed model, $$R^{\mathrm{2}}$$ is consistent with this result, explaining 94.8% of the predicted model. Figures 2 and 3 are a representation of deduced predictions. Figure 2 show the training of both Ravdin and Clark ANN and Survival ANN with a dashed line. (2a) in the case of normal R&C ANN, and (2b) in the case of modified R&C ANN. Figure 3 has a straight line to indicate the best approximation for error minimization. For validation purposes, the 25% of historical data is used to estimate the generalization error. Fig. 2. View largeDownload slide (a) Normal R&C ANN training and (b) Modified R&C ANN training. In both graphs straight lines are the modelled CDF and dashed lines are the predicted CDF by the ANNs. Fig. 2. View largeDownload slide (a) Normal R&C ANN training and (b) Modified R&C ANN training. In both graphs straight lines are the modelled CDF and dashed lines are the predicted CDF by the ANNs. Fig. 3. View largeDownload slide Modified R&C ANN CDF—$$Y\prime ^{(i)}$$ predictions versus modelled CDF—$$Y^{(i)}$$. Fig. 3. View largeDownload slide Modified R&C ANN CDF—$$Y\prime ^{(i)}$$ predictions versus modelled CDF—$$Y^{(i)}$$. This case study has generated a good prediction of a real failure based on 3 year of data of it. Using less than 3 years of data is possible, but to deduce covariate relationships with degradation may be difficult and the seasonal behaviour of some of them would degenerate future predictions. Our recommendation is to employ more than 2 years in the case of environmental influence. However, the number of discretized times in other applications may be less if the covariates are more stable over time. Returning now to the preventive maintenance, our mathematical tool allows one to implement an intelligent preventive maintenance strategy. The strategy is self-adaptive to observed imperfection in repairs and the influence of selected covariates on potential failures. Finally the strategy can to trigger a preventive maintenance action according to two possible business rules: A rule based on a determined level of confidence or failure probability as general reference, proportion of CDF($$t^{(i)})$$. In our case study, the level of estimated CDF($$t^{(i)})$$ which triggers preventive maintenance is 0.6. Another rule based on risk-cost-benefit analysis. In this case, we can consider not only the failure probability, but also the cost of the possibility to reach its minimum expected value:   C(t(i))=CDF(t(i))t(i)⋅Costcorrective+[1−CDF(t(i))t(i)]⋅Costpreventive; that is, consider the risk of being preventive several steps ahead, and the risk of waiting for the failure (due to corrective unavailability). For the second business rule, the last step of the methodology is to obtain an on-line economic estimation of risk, as in Fig. 4. The idea is to determine the optimal interval between preventive actions ($$t^{(i)})$$ (Campbell & Jardine, 2001) to minimize the total expected cost of the equipment maintenance per unit time. In order to do so, the criteria governing the PM action release is determined by comparing the economic value of risk (for a specific period to be selected) of the following two maintenance strategies: Strategy 1: Doing preventive ASAP, This would restore the equipment to a certain condition minimizing the risk of a failure for certain period (in this case study this condition is reached by updating only covariate values to normal equipment operating condition values), but would cost the price of the corresponding preventive maintenance activity (in our case Cost$$_{\mathrm{preventive}}(t^{(i)}) \quad =$$ 14,500 €). This calculation is computed on-line and compared to the risk of: Strategy 2: Doing nothing. The economic value of this strategy would be calculated by only computing the on-line risk of doing only corrective maintenance when failure takes place (in this case with a higher probability than if we follow strategy 1),   Costcorrective(t(i))=8,000€+(5,822€h⋅8h)=90,440€ (where the average corrective cost of a corrective is considered to be about 8,000 € and the indirect cost 82,440 €, estimated as loss of profit 5,822 €/h with a MTTR$$=$$ 8h). Fig. 4. View largeDownload slide Risk-cost analysis based on expected costs and searching the right time to trigger preventive action. (a) Straight line is the minimum expected cost, (b) dashed line is the expected preventive cost and (c) dotted line is the expected corrective cost. Fig. 4. View largeDownload slide Risk-cost analysis based on expected costs and searching the right time to trigger preventive action. (a) Straight line is the minimum expected cost, (b) dashed line is the expected preventive cost and (c) dotted line is the expected corrective cost. When the on-line risk of doing nothing (Strategy 2) exceeds the risk of the PM activity (Strategy 1) to a certain extent, then PM maintenance is automatically released and accomplished. This risk exceeding extent is understood here as a company policy. Figure 4 shows this concept. For instance, between $$t^{(i)} = 120$$ and 140 days would be a moment in time where Strategy 2 risk increases more than the PM Strategy 1 risk, PM would then be scheduled and released. Note how decisions are therefore taken based on strategy probability risk numbers and on-line. Finally, the repercussions of the chosen prediction model have to be evaluated with a cost-benefit analysis, prior to their implementation and communication to the entire organization. The most vulnerable (and/or sensitive) points of these pumps are mechanical seals. They are responsible for preventing fluid leakage (dangerous fluid at high temperature and pressure). Thanks to this research, the associated risk to ‘damaged seals’ failure mode could be reduced by 247,319 €/plant a year, with an estimated potential impact on the life cycle of the plant (25 years) of 7.24 M€. 5. Conclusions Thermal Solar Plant managers want to ensure longer profitability periods with more reliable plants. To ensure profitability during the life cycle of the plant we must ensure critical equipment reliability and maximum extension of their life cycle, otherwise failure costs will penalize the expected profit. Throughout this document, we suggest applying an ANN model per failure mode and we foster a practical implementation in SCADA systems for different plants. This methodology may ease and may improve decision-making and risk modelling, enabling reductions in corrective maintenance direct and indirect costs or allowing the display of residual life until total equipment failure. In cases when enough data for significant training is available, a better implementation of our methodology will help to reduce the costs and will improve the knowledge of the life cycle of the plant when suffering non-homogeneous operational and environmental conditions. The capacity of an ANN for self-learning among sources of data (sometimes noised or deprived of communication) thanks to reiterative memory is important. In our case study, we had a vast quantity of data, although sometimes this data was affected by problems of sensor readings or communications. Back-propagation perceptron ANN is recommend for automation developments with real-time utilization. Furthermore, advanced ANN models could be applied when supporting additional variables. Acknowledgements The authors would like to acknowledge the support of the Scientific Chair of MM BinLadin for Operation and Maintenance Technology at Taibah University, Madina, Saudi Arabia. Funding SMARTSOLAR Project (OPN—INNPACTO-Ref IPT-2011-1282-920000). Reference Biganzoli, E., Boracchi, P., Mariani, L. & Marubini, E. ( 1998) Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat. Med. , 17, 1169– 1186. Google Scholar CrossRef Search ADS PubMed  Campbell, J. D. & Jardine, A. K. ( 2001) Maintenance Excellence: Optimizing Equipment Life-Cycle Decisions . New York: M. Dekker. ISBN 9781420029406. Caner, M., Gedik, E. & Keçebaş, A. ( 2011) Investigation on thermal performance calculation of two type solar air collectors using artificial neural network. Expert Syst. Appl. , 38, 1668– 1674. Google Scholar CrossRef Search ADS   Cohen, P. R. & Feigenbaum, E. A. (eds). ( 2014) The Handbook of Artificial Intelligence , vol. 3. Oxford: Butterworth-Heinemann. ISBN 9781483214399. Cox, P. R. ( 1972) Regression Models and Life-Tables. J. R. Stat. Soc. Series B (Methodological) , London: Wiley, 34, 187– 220. Google Scholar CrossRef Search ADS   Cox, D. R. & Oakes, D. ( 1984) Analysis of Survival Data , Monographs on Statistics & Applied Probability. vol. 21. United States: Chapman & Hall, CRC Press. ISBN 9780412244902. Curry, B., Morgan, P. & Beynon, M. ( 2000). Neural networks and flexible approximations. IMA J. Manag. Math. , 11, 19– 35. Google Scholar CrossRef Search ADS   Dagpunar, J. S. ( 1997). Renewal-type equations for a general repair process. Qual. Reliab. Eng. Int. , 13, 235– 245. Google Scholar CrossRef Search ADS   Ennis, T. ( 2009). Safety in design of thermal fluid heat transfer systems. Symposium series  (Rugby ed.). Hazards XXI: process safety and environmental protection von Institution of Chemical Engineers, vol. 155. 162– 169. Faraggi, D. & Simon, R. ( 1995). A neural network model for survival data. Stat. Med. , 14, 73– 82. Google Scholar CrossRef Search ADS PubMed  González-Prida, V.,, Barberá, L., Márquez, A. C. & Fernández, J. G. ( 2014). Modelling the repair warranty of an industrial asset using a non-homogeneous Poisson process and a general renewal process. IMA J. Manag. Math. , 26, 171– 183. doi: 10.1093/imaman/dpu002. Google Scholar CrossRef Search ADS   Hougaard, P. ( 2012). Analysis of multivariate survival data . New York: Springer Science and Business Media. Kalogirou, S. ( 2007). Artificial intelligence in energy and renewable energy systems. New York: Nova Publishers. Kuo, C. ( 2011). Cost efficiency estimations and the equity returns for the US public solar energy firms in 1990–2008. IMA J. Manag. Math. , 22, 307– 321. Google Scholar CrossRef Search ADS   Lapedes, A. & Farber, R. ( 1987). Nonlinear signal processing using neural networks: prediction and system modelling. IEEE international conference on neural networks, San Diego, CA, USA (No. LA-UR-87-2662; CONF-8706130-4) . Lawrence, S., Giles, C. L. & Tsoi, A. C. ( 1998). What size neural network gives optimal generalization? Convergence properties of backpropagation. Lee, E. T. & Wang, J. ( 2003). Statistical methods for survival data analysis, vol. 476. New Jersey: Wiley. Liestbl, K., Andersen, P. K. & Andersen, U. ( 1994). Survival analysis and neural nets. Stat. Med. , 13, 1189– 1200. Google Scholar CrossRef Search ADS   Malcolm, B., Bruce, C. & Morgan, P. ( 1999). Neural networks and finite-order approximations. IMA J. Manag. Math. , 10, 225– 244. Google Scholar CrossRef Search ADS   Martín, L., Zarzalejo, L. F., Polo, J. Navarro, A., Marchante, R. & Cony, M. ( 2010) Prediction of global solar irradiance based on time series analysis: application to solar thermal power plants energy production planning. Sol. Energ. , 84, 1772– 1781. Google Scholar CrossRef Search ADS   Mellit, A., Benghanem, M., Arab, A. H. Guessoum, A. ( 2005). An adaptive artificial neural network model for sizing stand-alone photovoltaic systems: application for isolated sites in Algeria. Renew. Energy , 30, 1501– 1524. Google Scholar CrossRef Search ADS   Ohno-Machado, L. ( 2001). Modeling medical prognosis: survival analysis techniques. J. Biomed. Inform. , 34, 428– 439. Google Scholar CrossRef Search ADS PubMed  Ravdin, P. M. & Clark, G. M. ( 1992). A practical application of neural network analysis for predicting outcome of individual breast cancer patients. Breast Cancer Res. Treat. , 22, 285– 293. Google Scholar CrossRef Search ADS PubMed  Rosenblatt, F. ( 1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. , 65, 386. Google Scholar CrossRef Search ADS PubMed  Smith, P. J. ( 2002). Analysis of failure and survival data. United States: Chapman & Hall, CRC Press. ISBN 9781584880752. Xiang, A., Lapuerta, P., Ryutov, A. Buckley, J. & Azen, S. ( 2000). Comparison of the performance of neural network methods and Cox regression for censored survival data. Comput. Stat. Data Anal. , 34, 243– 257. Google Scholar CrossRef Search ADS   © The authors 2016. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.

### Journal

IMA Journal of Management MathematicsOxford University Press

Published: Jan 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations