DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

Khalil Shihab

doi:10.1080/08839510701821645

DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

Shihab, Khalil 2008-04-18 00:00:00 Applied Artificial Intelligence, 22:352–376 Copyright # 2008 Taylor & Francis Group, LLC ISSN: 0883-9514 print/1087-6545 online DOI: 10.1080/08839510701821645 DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS Khalil Shihab School of Computer Science and Mathematics, Victoria University, Victoria, Australia The emphasis on the need to protect groundwater quality has resulted in an increased interest in groundwater quality assessment. Water experts and researchers in the area have been, however, arguing that the currently used techniques are not accurate means of measuring ground- water contamination. It is mainly because these techniques neglect not only the probabilistic dependencies between pollutants but also the precision and the accuracy of the tested methods used by environmental laboratories. Therefore, this work describes the development and application of a prototype Dynamic Bayesian Network (DBN) that addresses these problems through the use of a temporal probabilistic model. First, we present a new technique for data preprocessing. Then we describe the network models we developed, as well as the methods used to build these models. Various challenges, such as acquiring groundwater datasets, identifying pollutants and anticipating potential problem contaminants, are addressed. Finally, we present the results of applications of these models. INTRODUCTION Declining surface and groundwater quality is regarded as the most serious and persistent issue affecting Oman in particular. The Sultanate faces severe challenges as it confronts the extremely growing and compli- cated issues of contamination of the groundwater supply in and around hazardous waste disposal sites across the nation. There are many observable factors contributing to the deterioration of water quality. These factors need to be monitored and their maximum allowable limits need to be determined. Decline in water quality is manifested in a number of ways, for example, elevated nutrient levels, acid from mines, domestic and oil spill, waste from distilleries and factories, salt water intrusion and temperature. These factors and others will provide the input data for our computer system. Address correspondence to Khalil Shihab, currently on sabbatical leave to the Department of Computer Science, Box 36, SQU, Al-Khod 123, Oman. E-mail: kshihab@squ.edu.om Dynamic Modeling of Groundwater Pollutants 353 Groundwater quality and pollution are determined and measured by comparing physical, chemical, biological, microbiological, and radiological quantities and parameters to a set of standards and criteria (Anderson and Woessner 1992). A criterion is basically a scientific quantity upon which a judgment can be based (Wu-Seng 1993). In this work, however, we con- sidered only the chemical parameters, total dissolved solids (TDS), electri- cal conductivity (EC), water pH, chemical oxygen demand (COD), and nitrate (NO ), for more details see the heading Bayesian Networks Devel- opment. This is mainly because these parameters are recommended by the experts and the researchers in the area. In addition, the results of our analysis of data collected from many wells implied that these chemical parameters are useful indicators of groundwater quality because they con- stitute the majority of the variance in the data scatter. Various countries have attempted to develop satisfactory procedures for assessing, monitoring, and controlling contamination of the groundwater sup- ply in and around hazardous waste disposal sites (Anderson and Woessner 1992; Borsuk et al. 2004). These attempts resulted in various environmental regulations that focus attention on the maximum allowable limits of hazard- ous pollutants in the groundwater supply. However, they pay scant attention to the nature of groundwater data and the development of valid statistical procedures for detecting and monitoring groundwater contamination. Recent attempts based on Artificial Intelligence (AI) were first applied to the interpretation of biomonitoring data (Trigg et al. 2000). Other works were based on pattern recognition using artificial neural networks (NNs). A more recent study described a prototype Bayesian belief network for the diagnosis of acidification in Welsh rivers. Hobbs (1997) uses Bayesian probabilities to examine the risk of climate change on water resources, but does not extend this to drinking water quality or quantity. Egerton (1996) performs a risk analysis of water systems in terms of the cost- effectiveness of reliability improvements; while that study examines issues of regulatory compliance, it does not evaluate the effect of operator intervention or weather events on vulnerability. Bayesian methods of statistical inference offer the greatest potential for groundwater monitoring. This is because these methods can be used to recognize the variability arising from three different sources of errors, namely, analytical test errors, sampling errors, and time errors, in addition to the variability in the true concentration (Chong and Walley 1996). The Bayesian methods can also be used to significantly increase the precision and the accuracy of the test methods used in a given environmental labora- tory (Varis 1995). The mobility of salt and other pollutants in steady state and transient environmental conditions can be predicted by applying Bayesian models to a range of spatial and temporal scales under varying environmental conditions. Bayesian networks use statistical techniques that 354 K. Shihab tolerate subjectivity and small data sets. Furthermore, these methods are simple to apply and have sufficient flexibility to allow reaction to scientific complexity free from impediment from purely technical limitations. The process of Bayesian analysis begins by postulating a model in light of all available knowledge taken from relevant phenomenon. The previous knowledge as represented by the prior distribution of the model para- meters is then combined with the new data through Bayes’ theorem to yield the current knowledge (represented by the posterior distribution of model parameters). This process of updating information about the unknown model parameters is then repeated in a sequential manner as more and more new information becomes available. This work addresses the assessment of groundwater quality in the Sultanate of Oman, especially in the Salalah plain. Its primary aim is to develop a groundwater quality model and computer system prototype to assess and predict the impact of pollutants on the water column. SULTANATE OF OMAN, SALALAH PLAIN Oman (see Figure 1) has very substantial groundwater resources on which the country’s agriculture depends. The oil boom, the resultant popu- lation boom (possibly fivefold since the 1960s), and the new investment FIGURE 1 Sultanate of Oman. Dynamic Modeling of Groundwater Pollutants 355 have led to a large expansion in irrigated areas. The demand for domestic water supply has also increased, as living standards have risen. Oman has to tackle simultaneously, within a compressed timescale, the need to evaluate its groundwater resources and manage them effectively. The main populated areas are located in the north, along the flanks of the mountains, and in the south, around Oman’s second city, Salalah. The Salalah plain extends over a 253 km area to the north of the Omani coastline of the Arabian Sea to the Mountains of Dhofar. It is the only region in Oman to benefit from a substantial amount of rainfall from the southern monsoon Khareef. The average annual rainfall is about 110 mm but can range from 70 to 360 mm. July–August is normally the ‘‘wet’’ period. Groundwater derived from aquifers in the central part of the plain is of good quality. Some of the spring water is utilized by Falajs (i.e., tunnels dug horizontally to tap and transport underground water to agricultural fields that are often tens of kilometers away) to provide irrigation water to a part of the plain. Recharge is by underflow from mountains and from the springs. Modern irrigation techniques are in operation in large commercial farms mainly for the production of forage crops such as alfalfa and Rhodes grass. Recent economic development in the country, together with rapid expansion of the population has not only increased the demand for water, but also caused many threats to water resources and quality. A number of groundwater pollution incidents have been reported. The extensive utiliza- tion of groundwater resources without taking into consideration the safe yield of aquifers is considered the main cause of pollution. Point and non-point source contamination from agriculture, industrial and domestic uses are other sources of contamination of groundwater. Sea water intrusion is also another problem of concern since lots of farms are situated along the coastal line. The Ministry of Water Resources (MWR) in the Sultanate of Oman has been monitoring the groundwater quality since 1994. The regional moni- toring networks were completed in 1995 (Milligan and Gharbi 1995). More than 50,000 monitoring wells have been inventoried, in the course of which water samples have been collected and analyzed providing baseline data for environmental monitoring, which is consolidated in the national water quality database. The MWR has attempted to predict the groundwater quality by using traditional linear regression and nonmetric multidimensional scaling models to interpret groundwater data. So far these models have proven unsatisfactory mainly because they ignore the probabilistic temporal dependencies between water quality constituents, prompting the development of new models based on Bayesian techniques, which are the focus of this work. 356 K. Shihab FIGURE 2 Taqah region, which is the eastern part of the Salalah plain in Oman. Therefore, this work shows the development and the applications of Bayesian techniques to forecast groundwater pollution levels in the Salalah plain, in particular in the Taqah area, which is the eastern part of the Salalah plain (see Figure 2), for several reasons, including: . To be able to take the necessary emergency measures if the pollution level is going to affect the groundwater quality (i.e., when the level of pollution is going to be above a certain threshold); . To estimate the pollution in an area where there are no measurements; . To take preventive actions in some areas; . To find out which variables have the most influence on groundwater quality. DATA COLLECTION The Ministry of Water Resources (MWR) maintains data on the concen- tration of the harmful substances in the groundwater at Taqah monitoring sites, which are located to the south of the Sultanate of Oman, in the Salalah plain (MWR 2004). We observed that good quality data were obtained from several monitoring wells in this region. Because of the lack of monitoring wells in certain areas in that region, we filled in the missing measurements with data obtained from Oman Mining Company (OMCO) and Ministry of Environmental and Regional Municipalities (MRME). The MWR identified that the datasets collected from these monitoring wells in the Sultanate are important in assessing the groundwater quality and in the prediction of the effect of certain pollutants on drinking water. Dynamic Modeling of Groundwater Pollutants 357 The period covered in these locations is from 1994 to 2004 (Dames and Moore 1992; Entec Europe Limited 1998; MWR 2004). Dames & Moore, Inc. represents a global network of companies known as the Dames & Moore Group. Dames & Moore specializes in avariety of civilengineering and environmental and earth sciences disciplines, providing consulting services to corporate and government customers. Each site has several monitoring wells and water samples were collected periodically from these wells and the concentration of the pollutants in these water samples was recorded. DATA PREPROCESSING USING BAYESIAN REASONING Data for water quality assessment are normally collected from various monitoring wells and then analyzed in environmental laboratories in order to measure the concentration of a number of water quality constituents. We realized that the methods used by these laboratories do not emphasize accuracy. There is a lack of awareness among both laboratory and validation personnel regarding the possibility of false positives in environmental data. In order to overcome this problem and to have representative data, we, therefore, used the following modified Bayesian model to that developed by Banerjee et al. (1985), to preprocessing the datasets used for the development of the Bayesian Networks. Bayesian Models The formulation of the model is as follows. Let S denote a particular hazardous constituent of interest. Since the con- centration of the substance may vary from one well to another, it is necessary to consider each well separately. Let @ ¼ð@ ; @ ; @ ; .. . ; @ Þ be the vector of t t1 t2 t3 tm m measurements of the concentration of S in m distinct water samples from a given well at a given sampling occasion where ðm 1Þ and (t ¼ 1, 2, ...). Each measurement consists of the true concentration of S plus an error. Let X be the true concentration of S in the groundwater at sampling occasion t. Taking the assumption that the true concentration X is unknown and is a random variable, the model evaluates the posterior distribution of X given the sample measurements @ at sampling occasion t. All published work in the con- text of groundwater quality data rested on the normality assumption. That is, given X ¼@ and d , the concentration measurements in @ represent a ran- t t t dom sample of size m for random distribution with mean @ and variance d . Since the concentration of the substance S in water samples obtained at different sampling occasions might vary considerably, we assume that the parameters @ and d of the normal distribution are random variables with certain prior probability distribution. To model these prior distributions, 358 K. Shihab we also used the natural conjugate families of distribution for sampling from a normal distribution. Therefore, the model for prior distribution of X and d can be presented as follows. For t ¼ 1, 2, .. . and given d the conditional distribution of X at sampling occasion t is a normal distribution with mean l and variance t1 2 2 2 d d , and marginal distribution of d is an inverted gamma distribution t1 with parameter b and n . t1 t1 This model uses the following prior distribution, which represents the concentration measurements before the first sampling. The pdf of the prior distribution of X is: 8 9 "# ð2n þ1Þ=2 < = 1 x l f ðx Þ¼ 1 þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ð2:1Þ 0 0 : ; 2n 0 r b =n 0 0 which is the pdf of the student’s t-distribution with 2v degrees of freedom, location parameters l and variance d b =n . 0 0 Now suppose that the observations are available on the concentration of S, given the sample X the posterior marginal distribution of X is a t t student’s t-distribution with 2v degree of freedom, location parameters l and variance d b =n where the pdf has the form: t t t 8 9 ð2n þ1Þ=2 "# < = 1 x l pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ f ðx =xÞ¼ 1 þ ð2:2Þ t t : 2n ; t r b =n t t t where b ¼ b þ ðx xx Þ=2 þ mðl xx Þ=½2ð1 þ mr Þ tj t t t1 t1 t1 j¼1 n ¼ n þ m=2 t t1 2 2 l ¼ðl þ mxx r Þ=ð1 þ mrÞð2:3Þ t t1 t1 t1 2 2 2 r ¼ r =ð1 þ mr Þ t t1 t1 xx ¼ x =m: t tj j¼1 Dynamic Modeling of Groundwater Pollutants 359 The sequential nature of this posterior distribution is obvious from the equation of l . That is, at each sampling occasion t, when more new information about concentration of S in the groundwater is received, the posterior distribution is revised forming a recursion process. This process of updating the posterior distribution may be continued indefinitely when new data x becomes available. To present the true unknown concentration of the substance S in the well under consideration, it is frequently more convenient to put a range (or interval) which contains most of the posterior probability. Such inter- vals are called highest posterior density (HPD) intervals. Thus for a given probability content of (1 a), 0 < a < 1, a 100(1 a) percent HPD interval for X , is given by pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ l t ða=2Þr b =n ; ð2:4Þ 2n t t t t t when t (a=2) is the 100(1 a=2) percentile of the student’s t-distribution 2vt with 2vt degree of freedom. The Bayesian Algorithm In brief, the monitoring algorithm, which is based on the Bayesian model, is as follows: 1. Fix a value of a (0 < a < 1) based on the desired confidence level. In this case, we chose a to be 0.01. 2. Since we do not have enough data to work with, we used the same parameters of the prior distribution used in the model of Banerjee, Plantinga, and Ramirez. These parameters are: b ¼ 0:0073; n ¼ 2:336; l ¼ 9:53; d ¼ 3056:34: 0 0 3. At each sampling occasion t, (t ¼ 1, 2, .. .), compute the parameters b ; n ; l , and d of the posterior distribution X given the set of observa- t t t t t tions in @ on the concentration of S available from a given well in a given site using (2.3). Compute LHPD and UHPD using these parameter estimates and (2.4). 4. Plot l ; LHPD, and UHPD that are obtained in step 3 above against sampling occasion t. 5. For the next sampling occasion, update the values of the parameters b ; n ; l ; and d using (2.3) and the datasets just obtained. Then recom- t t t t puted LHPD, and UHPD using the updated parameter values in (2.4) and repeat step 4 above. 360 K. Shihab We have applied this algorithm on the datasets that were collected from Salalah in the Sultanate of Oman. It is expected that the dataset from each well is not normal, but each one is taken from a normal distribution. Some of these datasets needed to be scaled down to simplify the process and to have a smooth graph so that we can study them easily. For this purpose, we have used the following normalization technique: sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n n X 2 xx l x nxx i i x ¼ ; where xx ¼ x n and r ¼ : r n 1 Algorithm Implementation The preprocessing system is implemented on PC platform using Visual Basic programming language. Figure 3 shows the main window where the user may enter the data that include the location, the substance to be studied, the well number, the occasion, and the measured concentration. Tables 1 and 2 present the concentration data for TDS (Total Dissolved Solids) and pH respectively for Well 001=577 in the Taqah area. In particular, the tables show the true concentration data for TDS and pH produced by our pre-processing system. FIGURE 3 The main window of the preprocessing module. Dynamic Modeling of Groundwater Pollutants 361 TABLE 1 Concentration Data of TDS for Well 001=577 in the Salalah Plain Observed Expected true Date concentration LHPD concentration UHPD 84 1.147 0.85 1.15 1.45 85 1.106 1.00 1.13 1.26 86 1.938 1.12 1.40 1.68 87 2.237 1.33 1.61 1.88 88 3.857 1.60 2.06 2.52 89 3.834 1.91 2.35 2.79 90 3.957 2.18 2.58 2.98 91 3.761 2.38 2.73 3.08 92 4.3 2.58 2.90 3.23 93 3.958 2.72 3.01 3.30 94 1 2.54 2.83 3.11 95 3.714 2.64 2.90 3.16 96 3.65 2.73 2.96 3.19 97 3.381 2.78 2.99 3.20 98 3.396 2.83 3.02 3.20 99 3.477 2.87 3.04 3.22 00 3.498 2.91 3.07 3.23 01 3.23 2.93 3.08 3.23 02 3.243 2.95 3.09 3.22 03 3.267 2.97 3.10 3.22 04 3.297 2.99 3.11 3.22 TABLE 2 Concentration Data of pH for Well 001=577 in the Salalah Plain Observed Expected true Date concentration LHPD concentration UHPD 84 7.8 7.6 7.8 8.0 85 7.7 7.65 7.75 7.85 86 7.2 7.38 7.57 7.76 87 7 7.24 7.43 7.61 88 7 7.18 7.34 7.5 89 7.2 7.19 7.32 7.44 90 7 7.16 7.27 7.38 91 7 7.14 7.24 7.33 92 7.6 7.19 7.28 7.36 93 7.4 7.22 7.29 7.36 94 7.4 7.24 7.30 7.36 95 7 7.22 7.28 7.33 96 6.1 7.11 7.19 7.26 97 6.7 7.08 7.15 7.22 98 7.1 7.08 7.15 7.21 99 5.7 6.97 7.06 7.14 2000 5.7 6.89 6.98 7.07 01 6.3 6.85 6.94 7.02 02 6.2 6.82 6.90 6.98 03 6.4 6.80 6.88 6.95 04 6.4 6.78 6.85 6.92 362 K. Shihab FIGURE 4 Monitoring chart of TDS for Well MW1—Salalah plain. Figures 4 and 5, representing Tables 1 and 2, respectively, show whether the three parameters (expected true concentration, LHPD, and UPHD) are within the maximum and minimum level allowed for TDS and pH. These figures provide a rudimentary prediction of the groundwater quality. For example, Figure 5 shows that the well MW1 is contaminated because the true concentration of pH for this well is below the allowed level and hence the water is acidic. BAYESIAN NETWORKS After the preprocessing stage, we constructed and used a Bayesian Network (BN) as an initial building network for the construction of two Dynamic Bayesian Networks in order to predict the impact of pollution on groundwater quality. Bayesian Belief Networks (BBNs) BBNs are effective and practical representations of knowledge for reasoning under uncertainty. There are a number of successful applications Dynamic Modeling of Groundwater Pollutants 363 FIGURE 5 Monitoring chart of pH for Well MW1—Salalah plain. of these networks in such domains as diagnosis, prediction, planning, learning, vision, and natural language understanding (Nicholson and Brady 1994; Russell and Norvig 2003). BNs (see Figure 6) are graphical structures used for representing expert knowledge, drawing conclusions from input data, and explaining the reasoning process to the user. These networks are also called knowledge maps, probabilistic causal networks, and qualitative probabilistic networks (Jensen 2001). They have been increasingly popular knowledge representations for reasoning under uncertainty. A BN is a directed acyclic graph (DAC) whose structure corresponds to the dependency relations of the set of variables represented in the network (nodes). Each node in a belief network repre- sents a random variable, or uncertain quality, that can take two or more possible values. The arcs signify the existence of direct influences between the linked variables and the strengths of these influences are quantified by conditional probabilities. These links can be said to have a causal meaning. The graph in Figure 6 represents the following joint probability distributions of the variables V, Y, U, W, X, and T: PðU; V; Y; W; X; TÞ¼ PðT=WÞ PðX=WÞ PðW=V; YÞ PðU=VÞ PðV=YÞ PðYÞ 364 K. Shihab FIGURE 6 A simple BN. This result is obtained by applying the chain rule and using the depen- dency information represented in the network. P (Y) is called the prior prob- ability; and P (T=W), P (X=W), P (W=V, Y), P (U=V), and P (V=Y) are called the conditional probabilities. While prior probabilities, probabilities based on initial information, can be obtained from statistical data using the relative frequencies, conditional probabilities can be elicited from experts or calculated using different types of mathematical models. Within a Bayesian network, the basic computation is to calculate the belief of each node (the node’s conditional probability) based on the evidence that has been observed. This consists of instantiating the input variables, and propagating their effect through the network to update the probability of the hypothesis variables. An important purpose of BNs is to facilitate calculation of arbitrary conditional probabilities. Various techni- ques have been developed for evaluating node beliefs and for performing probabilistic inference. The most popular methods are due to Pearl (1988). Similar techniques have been developed for constraint networks in the Dempster-Shafer formalism (Russell and Norvig 2003; Jensen 2001). We observed dependencies within the network dependency model in order to establish weak and strong influences among the variables in the model and to find important variables for water quality. This procedure assists in forming some heuristics that will be cost-effective and useful not only for probabilistic inference but also for automatic construction of a belief network from data. Dynamic Bayesian Networks (DBNs) The problem of assessing and forecasting water quality requires not only modelling the static probabilistic dependencies between its constituents Dynamic Modeling of Groundwater Pollutants 365 (variables) but also the dynamic behavior of these constituents. DBNs can eas- ily capture these static and dynamic behaviors (Kevin and Nicholson 2004). They extend BNs from static domains to dynamic domains (Nicholson and Brady 1994; Brandherm and Jameson 2004). A static BN can be extended to a DBN by introducing relevant temporal dependencies between the representations of the static network at different times. In contrast to the time series models that use regression to represent correlations, DBNs rep- resent the temporal causal relationships between variables. Therefore, DBNs can introduce more general dependency models that capture richer and more realistic models of dynamic dependencies as well as the traditional static-belief network dependencies (Kim et al. 2004; Nefian et al. 2002). A series of BNs, which act as time slices, can be connected to create a DBN. As new evidence is added to a DBN, new time slices are added. To reduce computational complexity, old time slices are commonly removed and their information summarized into prior probabilities of following slices. This produces a moving window of slices. The main characteristic of DBNs is as follows: Let X be the state of the system at time t, and assume that: 1. The process is Markovian, i.e., PðX =X ; X ; .. . ; X Þ¼ PðX =X Þ: t 0 1 t1 t t1 2. The process is stationary or time-invariant, i.e., PðX =X Þ is the same for every t: t t1 Therefore, we just need P(X ), which is a static BN, and P(X =X ), 0 t t1 which is a network fragment, where the variables in X have no parents, t1 in order to have a DBN. DBNs can be effectively and cheaply used for monitoring and predict- ing complex situations that change over time such as the assessment of water-quality. For example, they have recently been used for predicting the outcome in critically-ill patients. They have also been used for monitor- ing and controlling highway traffic (Forbes et al. 1995), for identifying gene regularity from microarray data (Zou and Conzen 2005), and for prediction of river and lake water pollution (Lamon and Stow 2004; Shihab and Al-Chalabi 2004; Stow et al. 2003). Inference is performed as if the network were a normal BN, although the nature of DBNs usually results in larger and more complex networks, requiring more computation to update. Several researchers have recently developed adaptations of standard belief network representation and inference techniques to support temporal reasoning. Dagum and Galper (1993) and Guisan et al. (2002), for example, introduced the additive generalizations of belief-networks representation and inference techniques. 366 K. Shihab They integrated these techniques with the fundamental methods of Bayesian time series analysis to generate a dynamic network model. The model is applied to predict the progress of a patient in a surgical intensive care unit. Other techniques developed by Shortliffe and his colleagues (1990) have been applied to the problems of diagnosis in internal medicine, diagnosis of gas turbines in power generation, and text retrieval from a large body of writing. The temporal repetition of identical model structures encourages the integration of object-oriented techniques with Bayesian networks. This modeling technique has received increasing interest in the literature over the past decade. It started with methods for reusing elements of network specifications and division of large networks into smaller pieces. These and other successful object-oriented Bayesian networks (OOBNs) models and their applications to real-world problems have greatly encouraged us to develop a model and a computer system based on the OOBN represen- tation to assess and predict the water quality. Therefore, we used the Hugin (HUGIN Expert, 2008) and dHugin (Kjaerulff 1995) tools for implement- ing our Bayesian networks. The Hugin system allows the implementation of an OOBN. The system considers a Bayesian Network (BN) as a special case, initial building network, of an OOBN. Other networks in the OOBN are nodes that represent instances of the base network. On the other hand, dHUGIN (Kjaerulff 1995), implemented on the top of Hugin, is based on a message passing algorithm in junction trees, which is a version of probability updating in singly connected directed acyclic graphs (DAGs) (Jensen 2001). The inference, i.e. the probability update, over the current time window and time slices preceding it is performed using this algorithm between junction trees. Bayesian Networks Development As is mentioned above, this study covers the Taqah area (see Figure 2), which is the main part of the Salalah plain. This area extends from the foothills of the mountains to the arid desert. The desert here is of two types—the semi-desert (Badiyah) and the arid desert (Al Sahra). Some of the rural areas around Taqah experience a touch of the drizzle that descends on Salalah during the rainy season (Khareef). Among more than twenty wells in the Taqah area, 4 wells only were selected to be analyzed. Those 4 wells have had, to the greatest extent, complete data measurements and provide sufficient information for the assessment of the groundwater quality for this selected basin. Another point worth mentioning here is that all other wells in the Taqah area are close to each other. We, therefore, ignored these wells because they add no additional information. Dynamic Modeling of Groundwater Pollutants 367 Identifying the domain variables (pollution constituents) and the causal relationships between these variables constitute the main part of the development process. In our study, we only considered the dependen- cies between TDS, EC, and water pH. In the Sultanate of Oman, these are the main factors that researchers in the area were dealing with and, therefore, maintained good data about them. In fact, we used our literature-based network structure as a starting point for discussion with the researchers to explain the BN approach and to get their input. In addition, we analyzed the data collected from many wells and the results revealed that these chemical parameters are useful indicators of groundwater quality because they constitute the majority of the variance in the data scatter. The EC of the water has been used as a measure for the salinity hazard of the groundwater used for irrigation in the Salalah plain. According to international water-quality standards, irrigation water with EC values up to 1 mS=cm is safe for all crops and between 1 and 3 mS=cm is acceptable, but values higher than 3 mS=cm restrict the use of water for many irrigated crops. Changes in conductivity can be caused by changes in water content of the soil and by soil or groundwater contamination. The TDS limit is 600 mg=L, which is the objective of the current Plan of the MWR. TDS contains several dissolved solids but 90% of its concen- tration is made up of 6 constituents. These are: sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO ), and sulfate (SO ). 3 4 We, therefore, considered only these elements in the calculation of TDS, which is represented as a node without parents in the network structure. This simplification is necessary to make the problem tractable and to keep it consistent with available data without losing information. Other factors that are also considered less significant to groundwater quality in Oman were not recoded and therefore neglected in this study. We also used the following relationship between TDS and EC (Wu-Seng 1993): TDS ¼ A EC; where A is a constant with value between 0.75 and 0.77. Both TDS and EC can affect water acidity or water pH. Solute chemical constituents are variable in high concentration at lower pH (higher acidity). On the other hand, acidity allows migration of hydrogen ions (H ), which is an indication of conductivity. Therefore, our work concen- trated on the following relations: TDS ! EC; EC ! pH; TDS ! pH: 368 K. Shihab TABLE 3 Drinking Water Standard Element Limit for drinking water pH 7.0–8.5 Chloride mg=l 250 TDS 500–1000 Sulphate mg=l 200 Copper mg=l 1.3 Iron mg=l 0.5 Sodium 200–400 Knowing that the maximum allowable TDS in the drinking water is 600 mg=l, Table 3 shows the limits for a number of constituents of drinking water. The data sample is divided into two intervals (categories), consider- ing TDS ¼ 550 is the central point. Thus, the first category has TDS < 550 and the second category has TDS 550. For EC, we also divide the data sample into two categories: data with EC < 670 and data with EC 670. Regarding pH, we also divided the data sample into two categories, data with pH < 7.5 and data with pH > 7.5. The data table and the probability tables produced by this analysis for two wells are as in Table 3. Table 4 shows the monitoring measurements of the main components of TDS along with the measurements of EC and pH for Well 001=577. To analyze the relations mentioned above, the following probabilities were calculated: PðTDS < 550Þ¼ 0:556 and PðTDS 550Þ¼ 0:444: From the relationship between TDS and EC, the conditional probability presented in Table 5 was produced. Table 6 shows the conditional probability table that shows the conditional probability of pH given TDS and EC. Similarly, we obtained the conditional probability tables for other wells. TABLE 4 TDS and EC Data for Well 001=577 Year Mg SO Na Ca Cl HCO TDS EC 4 3 1994 17 14 31 96.11 51 186.9 2203.11 671 1995 7 11 20 60 35 124.4 2128 386 1996 16 25 28.5 52 60.91 156.8 2178.41 491.25 1997 12.62 19 30.3 101.65 51 314.15 2211.57 741 1998 11.75 16 32 92.6 62 114.7 2212.35 430 1999 13.57 15 26.47 98.31 55 336.6 2207.35 726.5 2000 14.4 20 27.7 104 59 310.2 2225.1 668 2001 15.5 23 32.2 135 61 320.5 2267.7 727 2002 12.4 17 28.1 115 67 315.6 2241.5 716.25 2003 13.7 20 29.5 121 68 316.5 2255.2 753 2004 14.7 21 31 130 71 318 2271.7 796 Dynamic Modeling of Groundwater Pollutants 369 TABLE 5 TDS, EC, pH, COD and NO Data for the Well 001=577 Yr TDS mg=LEC mS=cm pH COD Mg=LNO mg=L 1984 542.7 548 7.3 11.7 17.26 1985 525.5 548 7.8 16.5 17.14 1986 565.4 579 7.75 15.2 17.62 1987 604.2 588 7.57 14.5 18.93 1988 541.8 601 7.43 14.7 20.16 1989 565.9 625 7.34 13.8 19.39 1990 558.6 638 7.32 13.0 18.74 1991 640.4 798 7.27 12.8 17.67 1992 754.5 739 7.24 12.2 20.71 1993 798.7 758 7.28 13.9 15.38 1994 746.4 799 7.29 14.5 18.15 1995 615.8 514 7.3 14.6 31.95 1996 737.5 619 7.28 14.3 32.13 1997 753.6 869 7.19 12.7 42.95 1998 935.6 558 7.15 12.6 42.73 1999 1174 855 7.15 12.6 43.20 2000 1021 796 7.06 11.3 41.22 2001 1067 855 6.98 5.7 42.35 2002 1223 844 6.94 5.3 42.58 2003 1055 881 7.14 12.5 40.65 2004 1143 926 7.55 16.7 42.81 Nitrate (NO ) is an increasingly important indicator of water pollution from animal waste, human waste, fertilizers and solid waste. Nitrate and ammonium are indicators of pollution because both are soluble in water and could penetrate to deeper zones underground. The NO classified into three classes (MOE, 2008). First class the NO 3 3 concentration is between 0 and 10 mg=l (natural level of NO in ground- water, and excellent water for drinking). Second class the NO concen- tration is between 10 and 45 mg=l (drinkable but contaminated). Third class the NO concentration is higher than 45 mg=l (contaminated and non-drinkable). The statistical analysis reveals that there is a correlation between the sal- inity represented by the TDS and the nitrate. This type of correlation will allow us to incorporate the nitrate in our Bayesian models. The TDS is TABLE 6 The Conditional Probability Table (CPT), Which Shows the Conditional Probability of pH Given TDS and EC TDS < 550 TDS > ¼ 550 Well 577 EC < 670 EC > ¼ 670 EC < 670 EC > ¼ 670 pH < 7.5 0.333 0 0 0.667 pH > ¼ 7.5 0.667 1 1 0.333 370 K. Shihab reflections not only the water rock-interaction and the residence time, but also a source of contamination. The nitrate only represents a source of con- tamination either organic or inorganic. The chemical oxygen demand (COD), which reflects the organic and inorganic content of the water, is also an important constituent in the pol- lution assessment of the groundwater in Oman. The positive correlation between COD and pH allowed our Bayesian Networks to include this con- stituent. The COD classified into five classes (MOE, 2008), viz., excellent water for drinking (the COD concentration is between 0 and 1 mg=l), suitable for bathing (the COD concentration is between 1 and 3 mg=l), suitable for agriculture (the COD concentration is between 3 and 5 mg=l), industrial water (the COD concentration is less than 8 mg=l) and very high contami- nated water (the COD concentration is higher than 8 mg=l). However, in this study, we considered only the first three classes. We processed the dataset for other wells in the same way to build a static Bayesian network (BN) representing each well. We tested these BNs with different values of TDS, EC, pH, NO , and COD taken from the collected data for the four wells that were selected for the development of this pre- dictive model. Once the static BN model (static model) for each monitoring well was built, parameterized, and tested, we used these models as initial building networks in the construction of two OOBNs for groundwater quality prediction. The first OOBN, as shown in Figure 7, models the time slices for each well, characterizing the temporal nature of identical model FIGURE 7 The OOBN representing three time-sliced networks. Dynamic Modeling of Groundwater Pollutants 371 FIGURE 8 The initial building block representing one time-sliced network. structures, where the initial building network, see Figure 8, describes a generic time-sliced network. Four initial identical BN networks (each BN represents a well) intercon- nected in order to cover the whole area under study characterize the second OOBN network. Figure 9 shows a typical OOBN representing four monitoring wells in the Salalah plain, where Figure 10 shows the initial building network generating this OOBN. Temporal Networks As mentioned above, the initial-building network of the system is a one- time step representing a year. It is a model of the analysis of data sampled for each well of the four wells that are selected for this study. This one-time step network fragment, shown in Figure 10, represents a class in the object oriented paradigm. Objects (entities with identity, state, and behavior) are FIGURE 9 The OOBN representing four wells. 372 K. Shihab FIGURE 10 The initial building network for constructing identical interconnected networks. instances of classes that correspond to type declarations in traditional programming languages. In this context, a class is a description of an object by structure, behavior, and attributes. Whenever an object of a class is needed, an instance of that class is created. The initial building network is, therefore, a class containing the following three sets of nodes: . A set of input nodes: Input nodes act as place-holders for parents of nodes inside instances of the class. They cannot have parents within the class. . A set of output nodes: They should be connected to the input nodes of the next time slice network; hence they can be parents of nodes outside instances of the class. . A set of protected nodes: These nodes can only have parents and children inside the class itself. The input and output nodes are collectively referred to as interface nodes, see Kjaerulff (1995) and Nicholson and Brady (1994) for more details. The final OOBN is constructed by creating instances networks (objects) from the basic building network, spanning a number of time slices. Figure 10 shows a single time slice class for well MW1 and Figure 9 shows an OOBN representing three time-sliced networks. Basin Monitoring We model each well of the four selected wells in the Taqah area with an initial building network (generic models). These models are, however, Dynamic Modeling of Groundwater Pollutants 373 identical, both qualitative and quantitative (structure and conditional prob- ability tables) so they can be modeled by a single class containing input, output, and protected nodes. Each object is, therefore, an instance of this class representing one well. The instances are interconnected in order to cover the whole basin. Figure 8 shows an OOBN representing three monitoring wells in Taqah area. Improving the Conditional Probability Table (CPT) Derivation Process One of the problems with developing BBNs for groundwater quality is the difficulty inherent in the establishment of CPTs. Therefore, methods need to be sought to make it easier to select and justify values for CPTs. Studies have shown that sensitivity analysis can identify the relative impor- tance of parameters in a BBN for overall BBN performance. Concentrating efforts on gathering accurate data for the most important CPTs enables time and effort reduction. Sensitivity analysis has also been used for validation of BBNs. In addition to using the preprocessing technique for correcting the laboratory data, we used SamIam tool (Chan and Darwiche 2002) for analyzing the dependencies between variables in our Bayesian networks. SamIam suggests single and multiple parameter changes that satisfy experts query constraint. The tool helps the user to make the smallest possible change to a parameter value that can satisfy the constraint. APPLICATION RESULTS We noticed that the developed BNs provide a useful approach to the currently available datasets maintained by the MWR. The results of vali- dation gave experts a realistic assessment of the chances of achieving desired outcomes. The application was carried out with special emphasis on the advantages of Bayesian against traditional techniques. It involved monitoring the groundwater quality parameters and validating crucial assumptions. We tested the resulting network models in two phases. In the first phase, we examined data resulting from our preprocessing model, which was organized as yearly measurements covering data from the whole basin. The first task was to identify dependencies between the variables of ground- water quality in order to detect useful information on the process dynam- ics. The resulting network agreed very closely with the intuition of the experts. In the second phase, the aim was to test the constructed OOBNs for predicting the values of the variables in the future. The resulting networks were investigated by using Hugin Bayesian network inference tool for analyzing measurement data from three successive time slices. Using 374 K. Shihab BNs to predict future values requires discovery of a dependency model that relates together variables from successive time slices or in some other way embeds temporal features into the model. In Bayesian reasoning, the mar- ginal probability distribution of any node may be updated upon acquiring evidence for other nodes. CONCLUSION AND FURTHER WORK This work presents the assessment of groundwater quality. Bayesian methods have been investigated and shown to offer considerable potential for use in groundwater quality prediction. These methods are based on reasoning under conditions of uncertainty. They present effectively the relationships between the constituents of groundwater quality. Therefore, the simple BNs presented here are the first step towards having a compre- hensive network that contains the other variables that are considered by the researchers significant for the assessment of groundwater quality in the Salalah plain in particular. These variables include: . NO : Nitrate is an increasingly important indicator of water pollution from animal waste, human waste, fertilizers, and solid waste. Nitrate and ammonium are indicators of pollution because both are soluble in water and could penetrate to deeper zones underground. . Microbiological indicator organisms such as E. coli and fecal coliform bacteria. For the most part, these organisms are not harmful themselves, but they indicate the presence of fecal material, which may contain disease-causing (pathogenic) organisms. . Chemical oxygen demand (COD), a parameter which reflects the organic and inorganic content of the water, mainly needed for pollution assessment. Data were collected from many monitoring wells in the Salalah plain, which is allocated to the south of the Sultanate of Oman. We spent significant time and effort to gather sufficient relevant data for this study. We plan to continue this work by adding these variables to the resulting models in order to improve the models’ predictive accuracy. We also demonstrated the general benefit of using OOBN that describes identical structures that can be interconnected to represent a successive time slices network, i.e., DBN. REFERENCES Anderson, M. and W. Woessner. 1992. Applied Groundwater Modeling. Academic Press. Banerjee, A. K., P. L. Plantinga, and J. Ramirez. 1985. Monitoring Groundwater Quality: A Bayesian Approach, TR no. 773. Department of Statistics, University of Wisconsin, USA. Dynamic Modeling of Groundwater Pollutants 375 Borsuk, M., C. Stow, and K. Reckhow. 2004. A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis. Ecological Modeling 173:219–239. Brandherm, B. and A. Jameson. 2004. An extension of the differential approach for Bayesian net- work inference to dynamic Bayesian networks. International Journal of Intelligent Systems 19(8):727–748. Chan, H. and A. Darwiche. 2002. When do numbers really matter? Journal of Artificial Intelligence Research 17:265–287. Chong, H. G. and W. J. Walley. 1996. Rule-base versus probabilistic approaches to the diagnosis of faults in wastewater treatment processes. Artificial Intelligence in Engineering, 10(3):265–273. Dagum, P. and A. Galper. 1993. Additive belief-network models, in UAI 93: Proceedings of the Ninth Annual Conference on Uncertainty in Artificial Intelligence, July 9–11, 1993, D. Heckerman, E. H. Mamdani (Eds.), Morgan Kaufmann, pp. 91–98. Dames and Moore. 1992. Investigation of The Quality of Groundwater Abstracted from the Salalah Plain: Dhofar Municipality. Final Report. Egerton, A. 1996. Achieving reliable and cost effective water treatment. Water Science and Technology 33:143–149. Entec Europe Limited. 1998. Consultancy Services for The Study of Development Activities on Groundwater Quality of Wadi Adai, Al Khawd and Salalah Well Field Protection Zones. Contract No 96-2133, Final Report, Volume 4. Hydrogeology and Modeling. Salalah: Ministry of Water Resources. Forbes, J., T. Huang, K. Kanazawa, and S. Russell. 1995. The BATmobile: Towards a Bayesian automated taxi. Proceedings of the Intentional Conference on Reasoning under Uncertainty, pp. 1878–1885. Guisan, A., T. Edwards, and T. Hastie. 2002. Generalized linear and generalized additive models in studies of species distributions: Setting the scene. Ecological Modelling 157:89–100. Hobbs, B. F. 1997. Bayesian methods for analyzing climate change and water resource uncertainties, Journal of Environmental Management 49:53–72. HUGIN Expert Brochure. 2005. Aalborg, Denmark: HUGIN Expert A=S. Available online at http:// www.hugin.com. Accessed, February 10, 2008. Jensen, F. V. 2001. Bayesian Networks and Decision Diagrams. Springer. Kevin, B. and A. Nicholson. 2004. Bayesian Artificial Intelligence. Chapman & Hall=CRC. Kim, S., S. Imoto, and S. Miyano. 2004. Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems 75:57–65. Kjaerulff, U. 1995. dHugin: A computational system for dynamic time-sliced Bayesian Networks. International Journal of Forecasting 11:89–111. Lamon, E. C. and C. A. Stow. 2004. Bayesian methods for regional-scale lake eutrophication models. Water Research 38:2764–2774. Milligan, H. and A. Gharbi. 1995. Groundwater Management on the Salalah Plain. in Oman Ministry of Water Resources. International Conference on Water Resources Management in Arid Countries 2:530–538. Ministry of Water Resources (MWR), Sultanate of Oman. 2004. Law on the Protection of Water Resources. Promulgated by Decree of the Sultan No. 29 of 2004, and its implementing regulations (Regulations for the organization of wells and aflaj, and Regulations for the use of water desalination units on wells) (in Arabic). Nefian, L., X. P. Liang, X. Liu, and K. Murphy. 2002. Dynamic Bayesian Networks for audio-visual speech recognition EURASIP. Journal of Applied Signal Processing 11:1–15. Nicholson A. E. and J. M. Brady. 1994. Dynamic Belief Networks for discrete monitoring. IEEE Transactions on Systems, Man, and Cybernetics 24(11):1593–1610. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco: Morgan Kaufmann. Russell, S. and P. Norvig. 2003. Artificial Intelligence: A Modern Approach. 2nd edition. New Jersey: Prentice Hall, Inc. Shihab, K. and N. Al-Chalabi. 2004. A Bayesian framework for groundwater quality Assessment. Lecture Notes in Computer Science 3029:728–738. Shortliffe, E. H. et al (eds). 1990. Medical Informatics: Computer Applications in Health Care. Reading, MA: Addison-Wesley. Stow, C. A., M. E. Borsuk, and K. H. Reckhow. 2003. Comparison of estuarine water quality models for TMDL development in Neuse River Estuary. J. Water Res. Plan. Manag. 129:307–314. 376 K. Shihab Trigg, D. S., W. J. Wally, and S. J. Ormerod. 2000. A prototype Bayesian belief network for diagnosis of acidification in Welsh rivers. In: Development and Application Computer Techniques in Environmental Stu- dies, ed. G. Ibarra-Berastegi, pp. 163–172. Varis, O. 1995. Belief networks for modeling and assessment of environmental change. Environmetrics 6:439–444. Wu-Seng, L. 1993. Water Quality Modeling. CRC Press, Inc. Zou, M. and S. D. Conzen. 2005. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21(1):71–79. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Artificial Intelligence Taylor & Francis http://www.deepdyve.com/lp/taylor-francis/dynamic-modeling-of-groundwater-pollutants-with-bayesian-networks-mP03Qn5fNV

Loading next page...

References (34)

H. Chan, Adnan Darwiche (2001)
When do Numbers Really Matter?
ArXiv, abs/1408.1692
(1992)
Investigation of The Quality of Groundwater Abstracted from the Salalah Plain: Dhofar Municipality
M. Borsuk, C. Stow, K. Reckhow (2004)
A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis
Ecological Modelling, 173
Boris Brandherm, A. Jameson (2004)
An extension of the differential approach for Bayesian network inference to dynamic Bayesian networks
International Journal of Intelligent Systems, 19
P. Dagum, Adam Galper (1993)
Additive Belief-Network Models
(2003)
Artificial Intelligence: A Modern Approach
A. Egerton (1996)
Achieving reliable and cost effective water treatment
Water Science and Technology, 33
(1992)
Applied Groundwater Modeling
(1985)
Monitoring Groundwater Quality: A Bayesian Approach
K. Shihab, Nida Al-Chalabi (2004)
A Bayesian Framework for Groundwater Quality Assessment
J. Forbes, Timothy Huang, K. Kanazawa, Stuart Russell (1995)
The BATmobile: Towards a Bayesian Automated Taxi
M. Zou, S. Conzen (2005)
A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data
Bioinformatics, 21 1
B. Hobbs (1997)
Bayesian methods for analysing climate change and water resource uncertainties
Journal of Environmental Management, 49
E. Lamon, C. Stow (2004)
Bayesian methods for regional-scale eutrophication models.
Water research, 38 11
(1998)
Consultancy Services for The Study of Development Activities on Groundwater Quality of Wadi Adai, Al Khawd and Salalah Well Field Protection Zones
D. Zelterman (2005)
Bayesian Artificial Intelligence
Technometrics, 47
(1995)
Groundwater Management on the Salalah Plain
A. Guisan, T. Edwards, T. Hastie (2002)
Generalized linear and generalized additive models in studies of species distributions: setting the scene
Ecological Modelling, 157
Uffe Kjærulff (1995)
dHugin: a computational system for dynamic time-sliced Bayesian networks
International Journal of Forecasting, 11
SunYong Kim, S. Imoto, S. Miyano (2003)
Dynamic Bayesian Network and Nonparametric Regression for Nonlinear Modeling of Gene Networks from Time Series Gene Expression Data
Bio Systems, 75 1-3
O. Varis (1995)
Belief networks for modelling and assessment of environmental change
Environmetrics, 6
K. Korb, A. Nicholson (2004)
Bayesian Artificial Intelligence
F. Jensen (2001)
Bayesian Networks and Decision Graphs
F. Jensen, Thomas Nielsen (2007)
Bayesian Networks and Decision Diagrams
E. Shortliffe, L. Perreault, G. Wiederhold, Lawrence Fagan (1991)
Medical Informatics: Computer Applications in Health Care
JAMA, 265
A. Nicholson, M. Brady (1994)
Dynamic Belief Networks for Discrete Monitoring
IEEE Trans. Syst. Man Cybern. Syst., 24
A. Nefian, Luhong Liang, Xiaobo Pi, Xiaoxing Liu, Kevin Murphy (2002)
Dynamic Bayesian Networks for Audio-Visual Speech Recognition
EURASIP Journal on Advances in Signal Processing, 2002
(2005)
Aalborg, Denmark: HUGIN Expert A=S
J. Pearl (1991)
Probabilistic reasoning in intelligent systems - networks of plausible inference
C. Stow, C. Roessler, M. Borsuk, J. Bowen, K. Reckhow (2003)
A Comparison of Estuarine Water Quality Models for TMDL development in the Neuse River Estuary
C. Stow, C. Roessler, M. Borsuk, J. Bowen, K. Reckhow (2003)
Comparison of Estuarine Water Quality Models for Total Maximum Daily Load Development in Neuse River Estuary
Journal of Water Resources Planning and Management, 129
D. Trigg, W. Walley, S. Ormerod (2000)
A Prototype Bayesian Belief Network For TheDiagnosis Of Acidification In Welsh Rivers
, 41
S. McCutcheon (2006)
Water Quality Modeling
H. Chong, W. Walley (1996)
Rule-based versus probabilistic approaches to the diagnosis of faults in wastewater treatment processes
Artif. Intell. Eng., 10

Publisher: Taylor & Francis
Copyright: Copyright Taylor & Francis Group, LLC
ISSN: 1087-6545
eISSN: 0883-9514
DOI: 10.1080/08839510701821645
Publisher site: See Article on Publisher Site

Abstract

Applied Artificial Intelligence, 22:352–376 Copyright # 2008 Taylor & Francis Group, LLC ISSN: 0883-9514 print/1087-6545 online DOI: 10.1080/08839510701821645 DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS Khalil Shihab School of Computer Science and Mathematics, Victoria University, Victoria, Australia The emphasis on the need to protect groundwater quality has resulted in an increased interest in groundwater quality assessment. Water experts and researchers in the area have been, however, arguing that the currently used techniques are not accurate means of measuring ground- water contamination. It is mainly because these techniques neglect not only the probabilistic dependencies between pollutants but also the precision and the accuracy of the tested methods used by environmental laboratories. Therefore, this work describes the development and application of a prototype Dynamic Bayesian Network (DBN) that addresses these problems through the use of a temporal probabilistic model. First, we present a new technique for data preprocessing. Then we describe the network models we developed, as well as the methods used to build these models. Various challenges, such as acquiring groundwater datasets, identifying pollutants and anticipating potential problem contaminants, are addressed. Finally, we present the results of applications of these models. INTRODUCTION Declining surface and groundwater quality is regarded as the most serious and persistent issue affecting Oman in particular. The Sultanate faces severe challenges as it confronts the extremely growing and compli- cated issues of contamination of the groundwater supply in and around hazardous waste disposal sites across the nation. There are many observable factors contributing to the deterioration of water quality. These factors need to be monitored and their maximum allowable limits need to be determined. Decline in water quality is manifested in a number of ways, for example, elevated nutrient levels, acid from mines, domestic and oil spill, waste from distilleries and factories, salt water intrusion and temperature. These factors and others will provide the input data for our computer system. Address correspondence to Khalil Shihab, currently on sabbatical leave to the Department of Computer Science, Box 36, SQU, Al-Khod 123, Oman. E-mail: kshihab@squ.edu.om Dynamic Modeling of Groundwater Pollutants 353 Groundwater quality and pollution are determined and measured by comparing physical, chemical, biological, microbiological, and radiological quantities and parameters to a set of standards and criteria (Anderson and Woessner 1992). A criterion is basically a scientific quantity upon which a judgment can be based (Wu-Seng 1993). In this work, however, we con- sidered only the chemical parameters, total dissolved solids (TDS), electri- cal conductivity (EC), water pH, chemical oxygen demand (COD), and nitrate (NO ), for more details see the heading Bayesian Networks Devel- opment. This is mainly because these parameters are recommended by the experts and the researchers in the area. In addition, the results of our analysis of data collected from many wells implied that these chemical parameters are useful indicators of groundwater quality because they con- stitute the majority of the variance in the data scatter. Various countries have attempted to develop satisfactory procedures for assessing, monitoring, and controlling contamination of the groundwater sup- ply in and around hazardous waste disposal sites (Anderson and Woessner 1992; Borsuk et al. 2004). These attempts resulted in various environmental regulations that focus attention on the maximum allowable limits of hazard- ous pollutants in the groundwater supply. However, they pay scant attention to the nature of groundwater data and the development of valid statistical procedures for detecting and monitoring groundwater contamination. Recent attempts based on Artificial Intelligence (AI) were first applied to the interpretation of biomonitoring data (Trigg et al. 2000). Other works were based on pattern recognition using artificial neural networks (NNs). A more recent study described a prototype Bayesian belief network for the diagnosis of acidification in Welsh rivers. Hobbs (1997) uses Bayesian probabilities to examine the risk of climate change on water resources, but does not extend this to drinking water quality or quantity. Egerton (1996) performs a risk analysis of water systems in terms of the cost- effectiveness of reliability improvements; while that study examines issues of regulatory compliance, it does not evaluate the effect of operator intervention or weather events on vulnerability. Bayesian methods of statistical inference offer the greatest potential for groundwater monitoring. This is because these methods can be used to recognize the variability arising from three different sources of errors, namely, analytical test errors, sampling errors, and time errors, in addition to the variability in the true concentration (Chong and Walley 1996). The Bayesian methods can also be used to significantly increase the precision and the accuracy of the test methods used in a given environmental labora- tory (Varis 1995). The mobility of salt and other pollutants in steady state and transient environmental conditions can be predicted by applying Bayesian models to a range of spatial and temporal scales under varying environmental conditions. Bayesian networks use statistical techniques that 354 K. Shihab tolerate subjectivity and small data sets. Furthermore, these methods are simple to apply and have sufficient flexibility to allow reaction to scientific complexity free from impediment from purely technical limitations. The process of Bayesian analysis begins by postulating a model in light of all available knowledge taken from relevant phenomenon. The previous knowledge as represented by the prior distribution of the model para- meters is then combined with the new data through Bayes’ theorem to yield the current knowledge (represented by the posterior distribution of model parameters). This process of updating information about the unknown model parameters is then repeated in a sequential manner as more and more new information becomes available. This work addresses the assessment of groundwater quality in the Sultanate of Oman, especially in the Salalah plain. Its primary aim is to develop a groundwater quality model and computer system prototype to assess and predict the impact of pollutants on the water column. SULTANATE OF OMAN, SALALAH PLAIN Oman (see Figure 1) has very substantial groundwater resources on which the country’s agriculture depends. The oil boom, the resultant popu- lation boom (possibly fivefold since the 1960s), and the new investment FIGURE 1 Sultanate of Oman. Dynamic Modeling of Groundwater Pollutants 355 have led to a large expansion in irrigated areas. The demand for domestic water supply has also increased, as living standards have risen. Oman has to tackle simultaneously, within a compressed timescale, the need to evaluate its groundwater resources and manage them effectively. The main populated areas are located in the north, along the flanks of the mountains, and in the south, around Oman’s second city, Salalah. The Salalah plain extends over a 253 km area to the north of the Omani coastline of the Arabian Sea to the Mountains of Dhofar. It is the only region in Oman to benefit from a substantial amount of rainfall from the southern monsoon Khareef. The average annual rainfall is about 110 mm but can range from 70 to 360 mm. July–August is normally the ‘‘wet’’ period. Groundwater derived from aquifers in the central part of the plain is of good quality. Some of the spring water is utilized by Falajs (i.e., tunnels dug horizontally to tap and transport underground water to agricultural fields that are often tens of kilometers away) to provide irrigation water to a part of the plain. Recharge is by underflow from mountains and from the springs. Modern irrigation techniques are in operation in large commercial farms mainly for the production of forage crops such as alfalfa and Rhodes grass. Recent economic development in the country, together with rapid expansion of the population has not only increased the demand for water, but also caused many threats to water resources and quality. A number of groundwater pollution incidents have been reported. The extensive utiliza- tion of groundwater resources without taking into consideration the safe yield of aquifers is considered the main cause of pollution. Point and non-point source contamination from agriculture, industrial and domestic uses are other sources of contamination of groundwater. Sea water intrusion is also another problem of concern since lots of farms are situated along the coastal line. The Ministry of Water Resources (MWR) in the Sultanate of Oman has been monitoring the groundwater quality since 1994. The regional moni- toring networks were completed in 1995 (Milligan and Gharbi 1995). More than 50,000 monitoring wells have been inventoried, in the course of which water samples have been collected and analyzed providing baseline data for environmental monitoring, which is consolidated in the national water quality database. The MWR has attempted to predict the groundwater quality by using traditional linear regression and nonmetric multidimensional scaling models to interpret groundwater data. So far these models have proven unsatisfactory mainly because they ignore the probabilistic temporal dependencies between water quality constituents, prompting the development of new models based on Bayesian techniques, which are the focus of this work. 356 K. Shihab FIGURE 2 Taqah region, which is the eastern part of the Salalah plain in Oman. Therefore, this work shows the development and the applications of Bayesian techniques to forecast groundwater pollution levels in the Salalah plain, in particular in the Taqah area, which is the eastern part of the Salalah plain (see Figure 2), for several reasons, including: . To be able to take the necessary emergency measures if the pollution level is going to affect the groundwater quality (i.e., when the level of pollution is going to be above a certain threshold); . To estimate the pollution in an area where there are no measurements; . To take preventive actions in some areas; . To find out which variables have the most influence on groundwater quality. DATA COLLECTION The Ministry of Water Resources (MWR) maintains data on the concen- tration of the harmful substances in the groundwater at Taqah monitoring sites, which are located to the south of the Sultanate of Oman, in the Salalah plain (MWR 2004). We observed that good quality data were obtained from several monitoring wells in this region. Because of the lack of monitoring wells in certain areas in that region, we filled in the missing measurements with data obtained from Oman Mining Company (OMCO) and Ministry of Environmental and Regional Municipalities (MRME). The MWR identified that the datasets collected from these monitoring wells in the Sultanate are important in assessing the groundwater quality and in the prediction of the effect of certain pollutants on drinking water. Dynamic Modeling of Groundwater Pollutants 357 The period covered in these locations is from 1994 to 2004 (Dames and Moore 1992; Entec Europe Limited 1998; MWR 2004). Dames & Moore, Inc. represents a global network of companies known as the Dames & Moore Group. Dames & Moore specializes in avariety of civilengineering and environmental and earth sciences disciplines, providing consulting services to corporate and government customers. Each site has several monitoring wells and water samples were collected periodically from these wells and the concentration of the pollutants in these water samples was recorded. DATA PREPROCESSING USING BAYESIAN REASONING Data for water quality assessment are normally collected from various monitoring wells and then analyzed in environmental laboratories in order to measure the concentration of a number of water quality constituents. We realized that the methods used by these laboratories do not emphasize accuracy. There is a lack of awareness among both laboratory and validation personnel regarding the possibility of false positives in environmental data. In order to overcome this problem and to have representative data, we, therefore, used the following modified Bayesian model to that developed by Banerjee et al. (1985), to preprocessing the datasets used for the development of the Bayesian Networks. Bayesian Models The formulation of the model is as follows. Let S denote a particular hazardous constituent of interest. Since the con- centration of the substance may vary from one well to another, it is necessary to consider each well separately. Let @ ¼ð@ ; @ ; @ ; .. . ; @ Þ be the vector of t t1 t2 t3 tm m measurements of the concentration of S in m distinct water samples from a given well at a given sampling occasion where ðm 1Þ and (t ¼ 1, 2, ...). Each measurement consists of the true concentration of S plus an error. Let X be the true concentration of S in the groundwater at sampling occasion t. Taking the assumption that the true concentration X is unknown and is a random variable, the model evaluates the posterior distribution of X given the sample measurements @ at sampling occasion t. All published work in the con- text of groundwater quality data rested on the normality assumption. That is, given X ¼@ and d , the concentration measurements in @ represent a ran- t t t dom sample of size m for random distribution with mean @ and variance d . Since the concentration of the substance S in water samples obtained at different sampling occasions might vary considerably, we assume that the parameters @ and d of the normal distribution are random variables with certain prior probability distribution. To model these prior distributions, 358 K. Shihab we also used the natural conjugate families of distribution for sampling from a normal distribution. Therefore, the model for prior distribution of X and d can be presented as follows. For t ¼ 1, 2, .. . and given d the conditional distribution of X at sampling occasion t is a normal distribution with mean l and variance t1 2 2 2 d d , and marginal distribution of d is an inverted gamma distribution t1 with parameter b and n . t1 t1 This model uses the following prior distribution, which represents the concentration measurements before the first sampling. The pdf of the prior distribution of X is: 8 9 "# ð2n þ1Þ=2 < = 1 x l f ðx Þ¼ 1 þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ð2:1Þ 0 0 : ; 2n 0 r b =n 0 0 which is the pdf of the student’s t-distribution with 2v degrees of freedom, location parameters l and variance d b =n . 0 0 Now suppose that the observations are available on the concentration of S, given the sample X the posterior marginal distribution of X is a t t student’s t-distribution with 2v degree of freedom, location parameters l and variance d b =n where the pdf has the form: t t t 8 9 ð2n þ1Þ=2 "# < = 1 x l pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ f ðx =xÞ¼ 1 þ ð2:2Þ t t : 2n ; t r b =n t t t where b ¼ b þ ðx xx Þ=2 þ mðl xx Þ=½2ð1 þ mr Þ tj t t t1 t1 t1 j¼1 n ¼ n þ m=2 t t1 2 2 l ¼ðl þ mxx r Þ=ð1 þ mrÞð2:3Þ t t1 t1 t1 2 2 2 r ¼ r =ð1 þ mr Þ t t1 t1 xx ¼ x =m: t tj j¼1 Dynamic Modeling of Groundwater Pollutants 359 The sequential nature of this posterior distribution is obvious from the equation of l . That is, at each sampling occasion t, when more new information about concentration of S in the groundwater is received, the posterior distribution is revised forming a recursion process. This process of updating the posterior distribution may be continued indefinitely when new data x becomes available. To present the true unknown concentration of the substance S in the well under consideration, it is frequently more convenient to put a range (or interval) which contains most of the posterior probability. Such inter- vals are called highest posterior density (HPD) intervals. Thus for a given probability content of (1 a), 0 < a < 1, a 100(1 a) percent HPD interval for X , is given by pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ l t ða=2Þr b =n ; ð2:4Þ 2n t t t t t when t (a=2) is the 100(1 a=2) percentile of the student’s t-distribution 2vt with 2vt degree of freedom. The Bayesian Algorithm In brief, the monitoring algorithm, which is based on the Bayesian model, is as follows: 1. Fix a value of a (0 < a < 1) based on the desired confidence level. In this case, we chose a to be 0.01. 2. Since we do not have enough data to work with, we used the same parameters of the prior distribution used in the model of Banerjee, Plantinga, and Ramirez. These parameters are: b ¼ 0:0073; n ¼ 2:336; l ¼ 9:53; d ¼ 3056:34: 0 0 3. At each sampling occasion t, (t ¼ 1, 2, .. .), compute the parameters b ; n ; l , and d of the posterior distribution X given the set of observa- t t t t t tions in @ on the concentration of S available from a given well in a given site using (2.3). Compute LHPD and UHPD using these parameter estimates and (2.4). 4. Plot l ; LHPD, and UHPD that are obtained in step 3 above against sampling occasion t. 5. For the next sampling occasion, update the values of the parameters b ; n ; l ; and d using (2.3) and the datasets just obtained. Then recom- t t t t puted LHPD, and UHPD using the updated parameter values in (2.4) and repeat step 4 above. 360 K. Shihab We have applied this algorithm on the datasets that were collected from Salalah in the Sultanate of Oman. It is expected that the dataset from each well is not normal, but each one is taken from a normal distribution. Some of these datasets needed to be scaled down to simplify the process and to have a smooth graph so that we can study them easily. For this purpose, we have used the following normalization technique: sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n n X 2 xx l x nxx i i x ¼ ; where xx ¼ x n and r ¼ : r n 1 Algorithm Implementation The preprocessing system is implemented on PC platform using Visual Basic programming language. Figure 3 shows the main window where the user may enter the data that include the location, the substance to be studied, the well number, the occasion, and the measured concentration. Tables 1 and 2 present the concentration data for TDS (Total Dissolved Solids) and pH respectively for Well 001=577 in the Taqah area. In particular, the tables show the true concentration data for TDS and pH produced by our pre-processing system. FIGURE 3 The main window of the preprocessing module. Dynamic Modeling of Groundwater Pollutants 361 TABLE 1 Concentration Data of TDS for Well 001=577 in the Salalah Plain Observed Expected true Date concentration LHPD concentration UHPD 84 1.147 0.85 1.15 1.45 85 1.106 1.00 1.13 1.26 86 1.938 1.12 1.40 1.68 87 2.237 1.33 1.61 1.88 88 3.857 1.60 2.06 2.52 89 3.834 1.91 2.35 2.79 90 3.957 2.18 2.58 2.98 91 3.761 2.38 2.73 3.08 92 4.3 2.58 2.90 3.23 93 3.958 2.72 3.01 3.30 94 1 2.54 2.83 3.11 95 3.714 2.64 2.90 3.16 96 3.65 2.73 2.96 3.19 97 3.381 2.78 2.99 3.20 98 3.396 2.83 3.02 3.20 99 3.477 2.87 3.04 3.22 00 3.498 2.91 3.07 3.23 01 3.23 2.93 3.08 3.23 02 3.243 2.95 3.09 3.22 03 3.267 2.97 3.10 3.22 04 3.297 2.99 3.11 3.22 TABLE 2 Concentration Data of pH for Well 001=577 in the Salalah Plain Observed Expected true Date concentration LHPD concentration UHPD 84 7.8 7.6 7.8 8.0 85 7.7 7.65 7.75 7.85 86 7.2 7.38 7.57 7.76 87 7 7.24 7.43 7.61 88 7 7.18 7.34 7.5 89 7.2 7.19 7.32 7.44 90 7 7.16 7.27 7.38 91 7 7.14 7.24 7.33 92 7.6 7.19 7.28 7.36 93 7.4 7.22 7.29 7.36 94 7.4 7.24 7.30 7.36 95 7 7.22 7.28 7.33 96 6.1 7.11 7.19 7.26 97 6.7 7.08 7.15 7.22 98 7.1 7.08 7.15 7.21 99 5.7 6.97 7.06 7.14 2000 5.7 6.89 6.98 7.07 01 6.3 6.85 6.94 7.02 02 6.2 6.82 6.90 6.98 03 6.4 6.80 6.88 6.95 04 6.4 6.78 6.85 6.92 362 K. Shihab FIGURE 4 Monitoring chart of TDS for Well MW1—Salalah plain. Figures 4 and 5, representing Tables 1 and 2, respectively, show whether the three parameters (expected true concentration, LHPD, and UPHD) are within the maximum and minimum level allowed for TDS and pH. These figures provide a rudimentary prediction of the groundwater quality. For example, Figure 5 shows that the well MW1 is contaminated because the true concentration of pH for this well is below the allowed level and hence the water is acidic. BAYESIAN NETWORKS After the preprocessing stage, we constructed and used a Bayesian Network (BN) as an initial building network for the construction of two Dynamic Bayesian Networks in order to predict the impact of pollution on groundwater quality. Bayesian Belief Networks (BBNs) BBNs are effective and practical representations of knowledge for reasoning under uncertainty. There are a number of successful applications Dynamic Modeling of Groundwater Pollutants 363 FIGURE 5 Monitoring chart of pH for Well MW1—Salalah plain. of these networks in such domains as diagnosis, prediction, planning, learning, vision, and natural language understanding (Nicholson and Brady 1994; Russell and Norvig 2003). BNs (see Figure 6) are graphical structures used for representing expert knowledge, drawing conclusions from input data, and explaining the reasoning process to the user. These networks are also called knowledge maps, probabilistic causal networks, and qualitative probabilistic networks (Jensen 2001). They have been increasingly popular knowledge representations for reasoning under uncertainty. A BN is a directed acyclic graph (DAC) whose structure corresponds to the dependency relations of the set of variables represented in the network (nodes). Each node in a belief network repre- sents a random variable, or uncertain quality, that can take two or more possible values. The arcs signify the existence of direct influences between the linked variables and the strengths of these influences are quantified by conditional probabilities. These links can be said to have a causal meaning. The graph in Figure 6 represents the following joint probability distributions of the variables V, Y, U, W, X, and T: PðU; V; Y; W; X; TÞ¼ PðT=WÞ PðX=WÞ PðW=V; YÞ PðU=VÞ PðV=YÞ PðYÞ 364 K. Shihab FIGURE 6 A simple BN. This result is obtained by applying the chain rule and using the depen- dency information represented in the network. P (Y) is called the prior prob- ability; and P (T=W), P (X=W), P (W=V, Y), P (U=V), and P (V=Y) are called the conditional probabilities. While prior probabilities, probabilities based on initial information, can be obtained from statistical data using the relative frequencies, conditional probabilities can be elicited from experts or calculated using different types of mathematical models. Within a Bayesian network, the basic computation is to calculate the belief of each node (the node’s conditional probability) based on the evidence that has been observed. This consists of instantiating the input variables, and propagating their effect through the network to update the probability of the hypothesis variables. An important purpose of BNs is to facilitate calculation of arbitrary conditional probabilities. Various techni- ques have been developed for evaluating node beliefs and for performing probabilistic inference. The most popular methods are due to Pearl (1988). Similar techniques have been developed for constraint networks in the Dempster-Shafer formalism (Russell and Norvig 2003; Jensen 2001). We observed dependencies within the network dependency model in order to establish weak and strong influences among the variables in the model and to find important variables for water quality. This procedure assists in forming some heuristics that will be cost-effective and useful not only for probabilistic inference but also for automatic construction of a belief network from data. Dynamic Bayesian Networks (DBNs) The problem of assessing and forecasting water quality requires not only modelling the static probabilistic dependencies between its constituents Dynamic Modeling of Groundwater Pollutants 365 (variables) but also the dynamic behavior of these constituents. DBNs can eas- ily capture these static and dynamic behaviors (Kevin and Nicholson 2004). They extend BNs from static domains to dynamic domains (Nicholson and Brady 1994; Brandherm and Jameson 2004). A static BN can be extended to a DBN by introducing relevant temporal dependencies between the representations of the static network at different times. In contrast to the time series models that use regression to represent correlations, DBNs rep- resent the temporal causal relationships between variables. Therefore, DBNs can introduce more general dependency models that capture richer and more realistic models of dynamic dependencies as well as the traditional static-belief network dependencies (Kim et al. 2004; Nefian et al. 2002). A series of BNs, which act as time slices, can be connected to create a DBN. As new evidence is added to a DBN, new time slices are added. To reduce computational complexity, old time slices are commonly removed and their information summarized into prior probabilities of following slices. This produces a moving window of slices. The main characteristic of DBNs is as follows: Let X be the state of the system at time t, and assume that: 1. The process is Markovian, i.e., PðX =X ; X ; .. . ; X Þ¼ PðX =X Þ: t 0 1 t1 t t1 2. The process is stationary or time-invariant, i.e., PðX =X Þ is the same for every t: t t1 Therefore, we just need P(X ), which is a static BN, and P(X =X ), 0 t t1 which is a network fragment, where the variables in X have no parents, t1 in order to have a DBN. DBNs can be effectively and cheaply used for monitoring and predict- ing complex situations that change over time such as the assessment of water-quality. For example, they have recently been used for predicting the outcome in critically-ill patients. They have also been used for monitor- ing and controlling highway traffic (Forbes et al. 1995), for identifying gene regularity from microarray data (Zou and Conzen 2005), and for prediction of river and lake water pollution (Lamon and Stow 2004; Shihab and Al-Chalabi 2004; Stow et al. 2003). Inference is performed as if the network were a normal BN, although the nature of DBNs usually results in larger and more complex networks, requiring more computation to update. Several researchers have recently developed adaptations of standard belief network representation and inference techniques to support temporal reasoning. Dagum and Galper (1993) and Guisan et al. (2002), for example, introduced the additive generalizations of belief-networks representation and inference techniques. 366 K. Shihab They integrated these techniques with the fundamental methods of Bayesian time series analysis to generate a dynamic network model. The model is applied to predict the progress of a patient in a surgical intensive care unit. Other techniques developed by Shortliffe and his colleagues (1990) have been applied to the problems of diagnosis in internal medicine, diagnosis of gas turbines in power generation, and text retrieval from a large body of writing. The temporal repetition of identical model structures encourages the integration of object-oriented techniques with Bayesian networks. This modeling technique has received increasing interest in the literature over the past decade. It started with methods for reusing elements of network specifications and division of large networks into smaller pieces. These and other successful object-oriented Bayesian networks (OOBNs) models and their applications to real-world problems have greatly encouraged us to develop a model and a computer system based on the OOBN represen- tation to assess and predict the water quality. Therefore, we used the Hugin (HUGIN Expert, 2008) and dHugin (Kjaerulff 1995) tools for implement- ing our Bayesian networks. The Hugin system allows the implementation of an OOBN. The system considers a Bayesian Network (BN) as a special case, initial building network, of an OOBN. Other networks in the OOBN are nodes that represent instances of the base network. On the other hand, dHUGIN (Kjaerulff 1995), implemented on the top of Hugin, is based on a message passing algorithm in junction trees, which is a version of probability updating in singly connected directed acyclic graphs (DAGs) (Jensen 2001). The inference, i.e. the probability update, over the current time window and time slices preceding it is performed using this algorithm between junction trees. Bayesian Networks Development As is mentioned above, this study covers the Taqah area (see Figure 2), which is the main part of the Salalah plain. This area extends from the foothills of the mountains to the arid desert. The desert here is of two types—the semi-desert (Badiyah) and the arid desert (Al Sahra). Some of the rural areas around Taqah experience a touch of the drizzle that descends on Salalah during the rainy season (Khareef). Among more than twenty wells in the Taqah area, 4 wells only were selected to be analyzed. Those 4 wells have had, to the greatest extent, complete data measurements and provide sufficient information for the assessment of the groundwater quality for this selected basin. Another point worth mentioning here is that all other wells in the Taqah area are close to each other. We, therefore, ignored these wells because they add no additional information. Dynamic Modeling of Groundwater Pollutants 367 Identifying the domain variables (pollution constituents) and the causal relationships between these variables constitute the main part of the development process. In our study, we only considered the dependen- cies between TDS, EC, and water pH. In the Sultanate of Oman, these are the main factors that researchers in the area were dealing with and, therefore, maintained good data about them. In fact, we used our literature-based network structure as a starting point for discussion with the researchers to explain the BN approach and to get their input. In addition, we analyzed the data collected from many wells and the results revealed that these chemical parameters are useful indicators of groundwater quality because they constitute the majority of the variance in the data scatter. The EC of the water has been used as a measure for the salinity hazard of the groundwater used for irrigation in the Salalah plain. According to international water-quality standards, irrigation water with EC values up to 1 mS=cm is safe for all crops and between 1 and 3 mS=cm is acceptable, but values higher than 3 mS=cm restrict the use of water for many irrigated crops. Changes in conductivity can be caused by changes in water content of the soil and by soil or groundwater contamination. The TDS limit is 600 mg=L, which is the objective of the current Plan of the MWR. TDS contains several dissolved solids but 90% of its concen- tration is made up of 6 constituents. These are: sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO ), and sulfate (SO ). 3 4 We, therefore, considered only these elements in the calculation of TDS, which is represented as a node without parents in the network structure. This simplification is necessary to make the problem tractable and to keep it consistent with available data without losing information. Other factors that are also considered less significant to groundwater quality in Oman were not recoded and therefore neglected in this study. We also used the following relationship between TDS and EC (Wu-Seng 1993): TDS ¼ A EC; where A is a constant with value between 0.75 and 0.77. Both TDS and EC can affect water acidity or water pH. Solute chemical constituents are variable in high concentration at lower pH (higher acidity). On the other hand, acidity allows migration of hydrogen ions (H ), which is an indication of conductivity. Therefore, our work concen- trated on the following relations: TDS ! EC; EC ! pH; TDS ! pH: 368 K. Shihab TABLE 3 Drinking Water Standard Element Limit for drinking water pH 7.0–8.5 Chloride mg=l 250 TDS 500–1000 Sulphate mg=l 200 Copper mg=l 1.3 Iron mg=l 0.5 Sodium 200–400 Knowing that the maximum allowable TDS in the drinking water is 600 mg=l, Table 3 shows the limits for a number of constituents of drinking water. The data sample is divided into two intervals (categories), consider- ing TDS ¼ 550 is the central point. Thus, the first category has TDS < 550 and the second category has TDS 550. For EC, we also divide the data sample into two categories: data with EC < 670 and data with EC 670. Regarding pH, we also divided the data sample into two categories, data with pH < 7.5 and data with pH > 7.5. The data table and the probability tables produced by this analysis for two wells are as in Table 3. Table 4 shows the monitoring measurements of the main components of TDS along with the measurements of EC and pH for Well 001=577. To analyze the relations mentioned above, the following probabilities were calculated: PðTDS < 550Þ¼ 0:556 and PðTDS 550Þ¼ 0:444: From the relationship between TDS and EC, the conditional probability presented in Table 5 was produced. Table 6 shows the conditional probability table that shows the conditional probability of pH given TDS and EC. Similarly, we obtained the conditional probability tables for other wells. TABLE 4 TDS and EC Data for Well 001=577 Year Mg SO Na Ca Cl HCO TDS EC 4 3 1994 17 14 31 96.11 51 186.9 2203.11 671 1995 7 11 20 60 35 124.4 2128 386 1996 16 25 28.5 52 60.91 156.8 2178.41 491.25 1997 12.62 19 30.3 101.65 51 314.15 2211.57 741 1998 11.75 16 32 92.6 62 114.7 2212.35 430 1999 13.57 15 26.47 98.31 55 336.6 2207.35 726.5 2000 14.4 20 27.7 104 59 310.2 2225.1 668 2001 15.5 23 32.2 135 61 320.5 2267.7 727 2002 12.4 17 28.1 115 67 315.6 2241.5 716.25 2003 13.7 20 29.5 121 68 316.5 2255.2 753 2004 14.7 21 31 130 71 318 2271.7 796 Dynamic Modeling of Groundwater Pollutants 369 TABLE 5 TDS, EC, pH, COD and NO Data for the Well 001=577 Yr TDS mg=LEC mS=cm pH COD Mg=LNO mg=L 1984 542.7 548 7.3 11.7 17.26 1985 525.5 548 7.8 16.5 17.14 1986 565.4 579 7.75 15.2 17.62 1987 604.2 588 7.57 14.5 18.93 1988 541.8 601 7.43 14.7 20.16 1989 565.9 625 7.34 13.8 19.39 1990 558.6 638 7.32 13.0 18.74 1991 640.4 798 7.27 12.8 17.67 1992 754.5 739 7.24 12.2 20.71 1993 798.7 758 7.28 13.9 15.38 1994 746.4 799 7.29 14.5 18.15 1995 615.8 514 7.3 14.6 31.95 1996 737.5 619 7.28 14.3 32.13 1997 753.6 869 7.19 12.7 42.95 1998 935.6 558 7.15 12.6 42.73 1999 1174 855 7.15 12.6 43.20 2000 1021 796 7.06 11.3 41.22 2001 1067 855 6.98 5.7 42.35 2002 1223 844 6.94 5.3 42.58 2003 1055 881 7.14 12.5 40.65 2004 1143 926 7.55 16.7 42.81 Nitrate (NO ) is an increasingly important indicator of water pollution from animal waste, human waste, fertilizers and solid waste. Nitrate and ammonium are indicators of pollution because both are soluble in water and could penetrate to deeper zones underground. The NO classified into three classes (MOE, 2008). First class the NO 3 3 concentration is between 0 and 10 mg=l (natural level of NO in ground- water, and excellent water for drinking). Second class the NO concen- tration is between 10 and 45 mg=l (drinkable but contaminated). Third class the NO concentration is higher than 45 mg=l (contaminated and non-drinkable). The statistical analysis reveals that there is a correlation between the sal- inity represented by the TDS and the nitrate. This type of correlation will allow us to incorporate the nitrate in our Bayesian models. The TDS is TABLE 6 The Conditional Probability Table (CPT), Which Shows the Conditional Probability of pH Given TDS and EC TDS < 550 TDS > ¼ 550 Well 577 EC < 670 EC > ¼ 670 EC < 670 EC > ¼ 670 pH < 7.5 0.333 0 0 0.667 pH > ¼ 7.5 0.667 1 1 0.333 370 K. Shihab reflections not only the water rock-interaction and the residence time, but also a source of contamination. The nitrate only represents a source of con- tamination either organic or inorganic. The chemical oxygen demand (COD), which reflects the organic and inorganic content of the water, is also an important constituent in the pol- lution assessment of the groundwater in Oman. The positive correlation between COD and pH allowed our Bayesian Networks to include this con- stituent. The COD classified into five classes (MOE, 2008), viz., excellent water for drinking (the COD concentration is between 0 and 1 mg=l), suitable for bathing (the COD concentration is between 1 and 3 mg=l), suitable for agriculture (the COD concentration is between 3 and 5 mg=l), industrial water (the COD concentration is less than 8 mg=l) and very high contami- nated water (the COD concentration is higher than 8 mg=l). However, in this study, we considered only the first three classes. We processed the dataset for other wells in the same way to build a static Bayesian network (BN) representing each well. We tested these BNs with different values of TDS, EC, pH, NO , and COD taken from the collected data for the four wells that were selected for the development of this pre- dictive model. Once the static BN model (static model) for each monitoring well was built, parameterized, and tested, we used these models as initial building networks in the construction of two OOBNs for groundwater quality prediction. The first OOBN, as shown in Figure 7, models the time slices for each well, characterizing the temporal nature of identical model FIGURE 7 The OOBN representing three time-sliced networks. Dynamic Modeling of Groundwater Pollutants 371 FIGURE 8 The initial building block representing one time-sliced network. structures, where the initial building network, see Figure 8, describes a generic time-sliced network. Four initial identical BN networks (each BN represents a well) intercon- nected in order to cover the whole area under study characterize the second OOBN network. Figure 9 shows a typical OOBN representing four monitoring wells in the Salalah plain, where Figure 10 shows the initial building network generating this OOBN. Temporal Networks As mentioned above, the initial-building network of the system is a one- time step representing a year. It is a model of the analysis of data sampled for each well of the four wells that are selected for this study. This one-time step network fragment, shown in Figure 10, represents a class in the object oriented paradigm. Objects (entities with identity, state, and behavior) are FIGURE 9 The OOBN representing four wells. 372 K. Shihab FIGURE 10 The initial building network for constructing identical interconnected networks. instances of classes that correspond to type declarations in traditional programming languages. In this context, a class is a description of an object by structure, behavior, and attributes. Whenever an object of a class is needed, an instance of that class is created. The initial building network is, therefore, a class containing the following three sets of nodes: . A set of input nodes: Input nodes act as place-holders for parents of nodes inside instances of the class. They cannot have parents within the class. . A set of output nodes: They should be connected to the input nodes of the next time slice network; hence they can be parents of nodes outside instances of the class. . A set of protected nodes: These nodes can only have parents and children inside the class itself. The input and output nodes are collectively referred to as interface nodes, see Kjaerulff (1995) and Nicholson and Brady (1994) for more details. The final OOBN is constructed by creating instances networks (objects) from the basic building network, spanning a number of time slices. Figure 10 shows a single time slice class for well MW1 and Figure 9 shows an OOBN representing three time-sliced networks. Basin Monitoring We model each well of the four selected wells in the Taqah area with an initial building network (generic models). These models are, however, Dynamic Modeling of Groundwater Pollutants 373 identical, both qualitative and quantitative (structure and conditional prob- ability tables) so they can be modeled by a single class containing input, output, and protected nodes. Each object is, therefore, an instance of this class representing one well. The instances are interconnected in order to cover the whole basin. Figure 8 shows an OOBN representing three monitoring wells in Taqah area. Improving the Conditional Probability Table (CPT) Derivation Process One of the problems with developing BBNs for groundwater quality is the difficulty inherent in the establishment of CPTs. Therefore, methods need to be sought to make it easier to select and justify values for CPTs. Studies have shown that sensitivity analysis can identify the relative impor- tance of parameters in a BBN for overall BBN performance. Concentrating efforts on gathering accurate data for the most important CPTs enables time and effort reduction. Sensitivity analysis has also been used for validation of BBNs. In addition to using the preprocessing technique for correcting the laboratory data, we used SamIam tool (Chan and Darwiche 2002) for analyzing the dependencies between variables in our Bayesian networks. SamIam suggests single and multiple parameter changes that satisfy experts query constraint. The tool helps the user to make the smallest possible change to a parameter value that can satisfy the constraint. APPLICATION RESULTS We noticed that the developed BNs provide a useful approach to the currently available datasets maintained by the MWR. The results of vali- dation gave experts a realistic assessment of the chances of achieving desired outcomes. The application was carried out with special emphasis on the advantages of Bayesian against traditional techniques. It involved monitoring the groundwater quality parameters and validating crucial assumptions. We tested the resulting network models in two phases. In the first phase, we examined data resulting from our preprocessing model, which was organized as yearly measurements covering data from the whole basin. The first task was to identify dependencies between the variables of ground- water quality in order to detect useful information on the process dynam- ics. The resulting network agreed very closely with the intuition of the experts. In the second phase, the aim was to test the constructed OOBNs for predicting the values of the variables in the future. The resulting networks were investigated by using Hugin Bayesian network inference tool for analyzing measurement data from three successive time slices. Using 374 K. Shihab BNs to predict future values requires discovery of a dependency model that relates together variables from successive time slices or in some other way embeds temporal features into the model. In Bayesian reasoning, the mar- ginal probability distribution of any node may be updated upon acquiring evidence for other nodes. CONCLUSION AND FURTHER WORK This work presents the assessment of groundwater quality. Bayesian methods have been investigated and shown to offer considerable potential for use in groundwater quality prediction. These methods are based on reasoning under conditions of uncertainty. They present effectively the relationships between the constituents of groundwater quality. Therefore, the simple BNs presented here are the first step towards having a compre- hensive network that contains the other variables that are considered by the researchers significant for the assessment of groundwater quality in the Salalah plain in particular. These variables include: . NO : Nitrate is an increasingly important indicator of water pollution from animal waste, human waste, fertilizers, and solid waste. Nitrate and ammonium are indicators of pollution because both are soluble in water and could penetrate to deeper zones underground. . Microbiological indicator organisms such as E. coli and fecal coliform bacteria. For the most part, these organisms are not harmful themselves, but they indicate the presence of fecal material, which may contain disease-causing (pathogenic) organisms. . Chemical oxygen demand (COD), a parameter which reflects the organic and inorganic content of the water, mainly needed for pollution assessment. Data were collected from many monitoring wells in the Salalah plain, which is allocated to the south of the Sultanate of Oman. We spent significant time and effort to gather sufficient relevant data for this study. We plan to continue this work by adding these variables to the resulting models in order to improve the models’ predictive accuracy. We also demonstrated the general benefit of using OOBN that describes identical structures that can be interconnected to represent a successive time slices network, i.e., DBN. REFERENCES Anderson, M. and W. Woessner. 1992. Applied Groundwater Modeling. Academic Press. Banerjee, A. K., P. L. Plantinga, and J. Ramirez. 1985. Monitoring Groundwater Quality: A Bayesian Approach, TR no. 773. Department of Statistics, University of Wisconsin, USA. Dynamic Modeling of Groundwater Pollutants 375 Borsuk, M., C. Stow, and K. Reckhow. 2004. A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis. Ecological Modeling 173:219–239. Brandherm, B. and A. Jameson. 2004. An extension of the differential approach for Bayesian net- work inference to dynamic Bayesian networks. International Journal of Intelligent Systems 19(8):727–748. Chan, H. and A. Darwiche. 2002. When do numbers really matter? Journal of Artificial Intelligence Research 17:265–287. Chong, H. G. and W. J. Walley. 1996. Rule-base versus probabilistic approaches to the diagnosis of faults in wastewater treatment processes. Artificial Intelligence in Engineering, 10(3):265–273. Dagum, P. and A. Galper. 1993. Additive belief-network models, in UAI 93: Proceedings of the Ninth Annual Conference on Uncertainty in Artificial Intelligence, July 9–11, 1993, D. Heckerman, E. H. Mamdani (Eds.), Morgan Kaufmann, pp. 91–98. Dames and Moore. 1992. Investigation of The Quality of Groundwater Abstracted from the Salalah Plain: Dhofar Municipality. Final Report. Egerton, A. 1996. Achieving reliable and cost effective water treatment. Water Science and Technology 33:143–149. Entec Europe Limited. 1998. Consultancy Services for The Study of Development Activities on Groundwater Quality of Wadi Adai, Al Khawd and Salalah Well Field Protection Zones. Contract No 96-2133, Final Report, Volume 4. Hydrogeology and Modeling. Salalah: Ministry of Water Resources. Forbes, J., T. Huang, K. Kanazawa, and S. Russell. 1995. The BATmobile: Towards a Bayesian automated taxi. Proceedings of the Intentional Conference on Reasoning under Uncertainty, pp. 1878–1885. Guisan, A., T. Edwards, and T. Hastie. 2002. Generalized linear and generalized additive models in studies of species distributions: Setting the scene. Ecological Modelling 157:89–100. Hobbs, B. F. 1997. Bayesian methods for analyzing climate change and water resource uncertainties, Journal of Environmental Management 49:53–72. HUGIN Expert Brochure. 2005. Aalborg, Denmark: HUGIN Expert A=S. Available online at http:// www.hugin.com. Accessed, February 10, 2008. Jensen, F. V. 2001. Bayesian Networks and Decision Diagrams. Springer. Kevin, B. and A. Nicholson. 2004. Bayesian Artificial Intelligence. Chapman & Hall=CRC. Kim, S., S. Imoto, and S. Miyano. 2004. Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems 75:57–65. Kjaerulff, U. 1995. dHugin: A computational system for dynamic time-sliced Bayesian Networks. International Journal of Forecasting 11:89–111. Lamon, E. C. and C. A. Stow. 2004. Bayesian methods for regional-scale lake eutrophication models. Water Research 38:2764–2774. Milligan, H. and A. Gharbi. 1995. Groundwater Management on the Salalah Plain. in Oman Ministry of Water Resources. International Conference on Water Resources Management in Arid Countries 2:530–538. Ministry of Water Resources (MWR), Sultanate of Oman. 2004. Law on the Protection of Water Resources. Promulgated by Decree of the Sultan No. 29 of 2004, and its implementing regulations (Regulations for the organization of wells and aflaj, and Regulations for the use of water desalination units on wells) (in Arabic). Nefian, L., X. P. Liang, X. Liu, and K. Murphy. 2002. Dynamic Bayesian Networks for audio-visual speech recognition EURASIP. Journal of Applied Signal Processing 11:1–15. Nicholson A. E. and J. M. Brady. 1994. Dynamic Belief Networks for discrete monitoring. IEEE Transactions on Systems, Man, and Cybernetics 24(11):1593–1610. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco: Morgan Kaufmann. Russell, S. and P. Norvig. 2003. Artificial Intelligence: A Modern Approach. 2nd edition. New Jersey: Prentice Hall, Inc. Shihab, K. and N. Al-Chalabi. 2004. A Bayesian framework for groundwater quality Assessment. Lecture Notes in Computer Science 3029:728–738. Shortliffe, E. H. et al (eds). 1990. Medical Informatics: Computer Applications in Health Care. Reading, MA: Addison-Wesley. Stow, C. A., M. E. Borsuk, and K. H. Reckhow. 2003. Comparison of estuarine water quality models for TMDL development in Neuse River Estuary. J. Water Res. Plan. Manag. 129:307–314. 376 K. Shihab Trigg, D. S., W. J. Wally, and S. J. Ormerod. 2000. A prototype Bayesian belief network for diagnosis of acidification in Welsh rivers. In: Development and Application Computer Techniques in Environmental Stu- dies, ed. G. Ibarra-Berastegi, pp. 163–172. Varis, O. 1995. Belief networks for modeling and assessment of environmental change. Environmetrics 6:439–444. Wu-Seng, L. 1993. Water Quality Modeling. CRC Press, Inc. Zou, M. and S. D. Conzen. 2005. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21(1):71–79.

Journal

Applied Artificial Intelligence – Taylor & Francis

Published: Apr 18, 2008

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS

References (34)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies