Overpumping leads to California groundwater arsenic threat

Overpumping leads to California groundwater arsenic threat ARTICLE DOI: 10.1038/s41467-018-04475-3 OPEN Overpumping leads to California groundwater arsenic threat 1 1 2 Ryan Smith , Rosemary Knight & Scott Fendorf Water resources are being challenged to meet domestic, agricultural, and industrial needs. To complement finite surface water supplies that are being stressed by changes in precipitation and increased demand, groundwater is increasingly being used. Sustaining groundwater use requires considering both water quantity and quality. A unique challenge for groundwater use, as compared with surface water, is the presence of naturally occurring contaminants within aquifer sediments, which can enter the water supply. Here we find that recent groundwater pumping, observed through land subsidence, results in an increase in aquifer arsenic concentrations in the San Joaquin Valley of California. By comparison, historic groundwater pumping shows no link to current groundwater arsenic concentrations. Our results support the premise that arsenic can reside within pore water of clay strata within aquifers and is released due to overpumping. We provide a quantitative model for using subsidence as an indicator of arsenic concentrations correlated with groundwater pumping. 1 2 Department of Geophysics, Stanford University, 397 Panama Mall, Stanford, California 94305, USA. Department of Earth System Science, Stanford University, 473 Via Ortega, Stanford, California 94305, USA. Correspondence and requests for materials should be addressed to R.S. (email: rgsmith@stanford.edu) NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 1 | | | 1234567890():,; ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 lobally, groundwater provides almost half of all drinking quantitative link between overpumping of groundwater systems water , making it one of the world’s most important and arsenic contamination, a mechanism that has been proposed Gresources. Within the United States, the Central Valley of previously but never statistically demonstrated. California accounts for roughly 20% of groundwater with- drawals . The Central Valley is an arid region that supports a $17 billion agricultural industry. In the southern, highly productive Results and Discussion region of the valley (known as the San Joaquin Valley), aquifers Modeling arsenic concentration. We integrated estimates of are particularly stressed due to the high water demands. Further, subsidence with additional variables known from previous stu- groundwater is the main source of drinking water for roughly one dies to affect arsenic levels into a random forest model that million people in the San Joaquin Valley. Herein, we focus on accounts for nonlinear relationships, as well as inter- threats to groundwater quality induced by naturally occurring dependencies of different variables. Two models were devel- arsenic, which may have devastating impacts on both human oped, one that predicts recent (2007 to 2015) arsenic con- health and food production. centrations and a second that predicts historic (1986 to 1993) Arsenic is a ubiquitous, naturally occurring contaminant that is arsenic concentrations. These dates were chosen because they a common problem in many aquifers that are pumped for both span long droughts in the San Joaquin Valley, during which drinking water ; most notably, it is presently having a devasting time appreciable groundwater decline and subsidence occurred impact on groundwater quality throughout Asia . When present (Supplementary Fig. 5). We did not include the mild-drought in significant amounts, it increases the risk of cancer, heart dis- conditions from 1999 to 2005, because we had limited subsidence ease, and diabetes . Hazardous levels of arsenic typically result data over that period, and because we considered more intense from anaerobic conditions, as noted for the shallow, Holocene droughts to produce a stronger signal for our analysis. We used aquifers of Asia , or from high pH (pH > 8.5) often observed in the output of the random forest models to assess the effect of high arsenic regions of the Andes . In addition, overpumping an overpumping as indicated by subsidence on arsenic levels. We aquifer system has been noted to increase the arsenic levels in found that in both models, concurrent subsidence markedly Southeast Asia by drawing arsenic from less-permeable anaerobic increases the risk of arsenic contamination. As both models clay strata into the aquifer . establish similar relationships (capturing the time periods of Arsenic within pumped groundwater of the San Joaquin Valley recent and past periods of drought), we focus our discussion on has been noted for decades. Approximately 10% of the wells the recent arsenic concentration model, pointing out where key tested within the last 10 years have shown arsenic (As) con- differences exist. The full results of both models are shown in the centrations above 10 µg/L (p.p.b.), the level recommended as the Supplementary Information (Supplementary Figs. 1-4) maximum acceptable by the World Health Organization (WHO). As groundwater is increasingly being pumped to meet agri- cultural and domestic needs, preserving groundwater resources Mechanism for arsenic contamination from overpumping. The (i.e., maintaining water quality) is imperative. Here we seek to relationship between subsidence and arsenic concentration is determine the source of arsenic contamination within ground- linked to clay strata (Fig. 1). It has been noted that clay particles water of the Central Valley, CA, and to develop quantitative become enriched in arsenic due to their reactivity and high sur- means for predicting degradation of groundwater quality. We face area to volume ratio relative to sand-sized particles . Arsenic focus our study on the Tulare basin (Fig. 1), a highly productive has been transported to the San Joaquin Valley from the Sierra agricultural region of the San Joaquin Valley where groundwater Nevada and coastal mountain ranges by rivers, which cut through is essential for meeting water demands. arsenic-bearing formations, for millions of years . Clays at or Groundwater pumping in the San Joaquin Valley has caused near the surface at the time of deposition are the primary host of declines of ~ 60 m in groundwater levels over the past century, transported arsenic, which has been shown to adsorb on clay leading to subsidence as high as ~ 9 m from 1925 to 1970, a rate 15 surfaces in significant amounts in the San Joaquin Valley ,ina of 20 cm/year . Rates of subsidence as high as 25 cm/year have process similar to what occurs in other sedimentary basins, been observed in more recent droughts spanning from 2007 to 4 including those throughout Asia . As the clays are buried over 9, 10 2010 and from 2012 to 2016 . Subsidence due to groundwater geologic time, their restricted oxygen supply results in reduction pumping is caused by a pore pressure drop in aquifer materials, 14 of the arsenic at depths > 60 m , resulting in dissolution of the which increases the effecive stress, σ ,defined as σ = σ – P , e e T p arsenic within the clay pore water . This mechanism for arsenic where σ is the total stress and P is the pore pressure . 13 T p concentration in clays is supported by other studies in the area , Increasing the effective stress results in aquifer compaction. The who have found a positive relationship between aquifer clay majority of compaction and linked subsidence of the overlying content and arsenic concentration in the San Joaquin Valley. ground surface is caused by drainage of clays due to their weaker When unperturbed, groundwater within the aquifer primarily geomechanical properties, as indicated by their higher skeletal flows horizontally through the sediments with highest perme- specific storage. Thus, the high subsidence levels provide a ability, typically sands and gravels. Thus, pumped groundwater measure of overpumping in aquifers having substantial clay comes mostly from sands and gravels, which have lower arsenic content, such as the aquifers of the San Joaquin Valley. In this concentrations. When the aquifer system is stressed from region, the upper 500 m, which is the greatest depth typically overpumping, high vertical hydraulic gradients cause a larger drilled for groundwater pumping, consists of alternating layers of volume of water to be drawn from less-permeable clays, inducing sand, gravel, and clay. In general, the aquifer system is divided the release of water with high arsenic concentrations. As a into an upper aquifer, a thick clay confining unit known as the consequence, overpumping increases arsenic concentrations, as Corcoran clay, and a lower aquifer. The upper and lower aquifers noted for the lower Mekong Delta, Vietnam . Concomitant with contain sands and gravels, as well as numerous thin clay layers . decreased pore pressures is compaction of the aquifer and As a measure of overpumping, we use subsidence, derived resulting land subsidence. Thus, the dual consequence of from Interferometric Synthetic Aperture Radar (InSAR) data, to overpumping is land subsidence and increased extraction of pore predict arsenic levels quantitatively. water from clay layers, giving rise to a link between land We find that subsidence data have a strong correlation with subsidence and groundwater arsenic concentrations. concurrent arsenic concentrations. This demonstrates a 2 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | | 0 NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ARTICLE Land subsidence, 2007- 2010 (cm/year) 37°N Subsidence data extent Well arsenic concentrat- ion, 2007-2015 (p.p.b.) Corcoran clay extent 36°N 1000 121°W 120°W 119°W 118°W Fig. 1 Comparison of subsidence, arsenic concentration, and clay extent within the San Joaquin Valley, CA. Arsenic concentrations increase toward the center of the valley where subsidence is more extensive and a confining clay layer (known as the Corcoran clay) is present . The grayed-out region is the area where recent subsidence data (obtained from InSAR) were processed. Arsenic concentrations vary by orders of magnitude and are thus shown with a logarithmic color bar (see Supplementary Fig. 8 for arsenic concentration histograms). The arsenic concentration data points were sourced from . The basemap was created using elevation data from the Shuttle Radar Topography Mission Quantifying the impact of overpumping on groundwater easily seen in InSAR data, should improve water quality, at least arsenic concentrations. We highlighted four of the most for the San Joaquin Valley. important variables describing arsenic concentration within the The historic arsenic concentration model shows a weaker, but Tulare Basin in the recent model, shown in Fig. 2a-d. Of these, still positive, correlation between Corcoran clay thickness and the thickness of the Corcoran Clay (a confining unit that overlies arsenic concentration, as well as manganese concentration and a lower aquifer) shows a positive correlation with arsenic con- arsenic concentration. Due to the dynamic nature of groundwater centrations due to increased clay content. Elevation has a negative chemistry, the relative importance of different mechanisms for correlation, as lower areas are more likely to have been water- arsenic contamination change over time, highlighting the need for saturated and thus anaerobic. A positive correlation was found regular calibration with up to date datasets. between log (Mn) and arsenic concentrations, as the presence of To further quantify the impact of overpumping on arsenic manganese indicates an anoxic environment, in which arsenic concentrations, we simulated three different hypothetical scenar- tends to be more soluble. Significantly, recent subsidence from ios with our predictive random forest model: oxic, suboxic, and InSAR showed a positive correlation, as overpumping leads to anoxic subsurface conditions. We used a variation of the partial increased pore water drainage from clays. The first three variables dependence plot that averages all variables except subsidence, are well-known from the literature and not related to human which is varied from 0 to 10 cm/year, and log (Mn), which is set activity. The quantitative link between pumping-induced sub- at 0, 1, and 2 for oxic, suboxic, and anoxic scenarios, respectively, sidence and arsenic concentrations has not been shown before, to estimate the probability that the predicted arsenic levels and is directly related to human activity. exceeded the WHO’s standard of 10 µg/L arsenic (Fig. 3). At ca. 8 By comparing the historic and recent arsenic concentration cm/year subsidence, the probability that arsenic exceeds the models, we see the transient nature of the relationship between WHO standard dramatically accelerates. This inflection point is pumping-induced subsidence and arsenic concentrations (Fig. 2e). also present at roughly the same location in the historic arsenic Although historic subsidence has a high impact on historic model (Fig. 2e). As inelastic deformation of clay beds begins, the arsenic concentrations, it has virtually no impact on recent rate of subsidence increases significantly . As inelastic deforma- arsenic concentrations. A lack of correlation between present tion occurs when pore pressure drops below the lowest level arsenic concentrations and historic subsidence (related to over- previously experienced, it releases pore water that has not been pumping) suggests that arsenic levels due to overpumping slowly mixed with the main aquifer previously, which in the present case return to their original levels after the groundwater pumping is results in arsenic-rich pore water being captured within the decreased, implying that arsenic is flushed from the auqifer. Thus, pumped groundwater. our findings indicated that avoiding overpumping of aquifers, NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 3 | | | Elevation (m) 0 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ab e 1.1 1.1 Predicted historic arsenic Predicted recent arsenic 0.9 0.9 2.0 0.7 0.7 0.5 0.5 1.5 0 10 20 30 40 0.0 0.5 1.0 1.5 2.0 2.5 Corcoran clay thickness (m) log (Mn) (p.p.b.) 1.0 cd 1.1 1.1 0.9 0.9 0.5 0.7 0.7 0.5 0.5 0.0 60 80 100 120 140 2.5 5.0 7.5 10.0 5 10 15 20 Elevation (m) Recent subsidence (cm/year) Modeled subsidence from 1986 to 1993 (cm/year) Fig. 2 Partial dependence plots of primary descriptors of groundwater arsenic concentrations. These include a confining clay thickness, b dissolved Mn concentrations, c elevation, and d recent subsidence. e Comparison of partial dependence plots of historic and recent subsidence. Note that historic subsidence has a great effect on historic arsenic levels but little effect on recent arsenic levels (e), whereas recent subsidence has a great effect on recent arsenic levels (d). All blue lines are derived from models predicting recent (2007–2015) arsenic concentrations, whereas the red line is derived from models predicting historic (1986–1993) arsenic concentrations Oxidation state 1 Oxic 2 Suboxic 3 Anoxic Methods Data acquisition and processing. There are many factors that control natural 0.3 variation in arsenic concentration. We developed a statistical model using numerous datasets to asses the role each plays as a predictor of arsenic con- centrations. We obtained arsenic level data, as well as proxy information on redox potential from manganese and sulfate concentrations, from 838 wells over the 2007 to 2015 time frame, and 424 wells over the 1986 to 1993 time frame, from the 0.2 GAMA database , which monitors groundwater quality in California (http:// geotracker.waterboards.ca.gov/gama/datadownload). We used a previously devel- oped groundwater model which provided us with several useful predictors of arsenic concentrations: percent of total water use from groundwater, which is directly related to pumping; total aquifer clay content, which sets an upper limit on 0.1 available arsenic from clays; top and bottom of perforated well interval, both of which indicate the depth from which pumping occurred, a soft indicator of the oxidation state; subsidence from 1962 to 1976 and subsidence from 1986 to 1993, which provide historic measures of pumping-induced clay drainage. Elevation was 036 9 also used as a predictor, obtained from the Shuttle Radar Topography Mission ; Land subsidence, 2007 to 2010 (cm/year) this is an indicator of closed-basin conditions, where contaminants such as arsenic tend to be concentrated. Slope was also derived from this dataset. As slope closely Fig. 3 Probability that arsenic levels exceeded the WHO standard. This was tracks predevelopment head , we used this with the clay content to estimate calculated based on recent subsidence for each arsenic risk category historic groundwater flow, which was also included as a predictor; our assumption being that enhanced groundwater flow would flush the system over time and reduce arsenic concentrations. We used average temperature data from PRISM; temperature variations have been related to changes in arsenic in previous work . Land subsidence due to overpumping increases the probability We also used evapotranspiration (ET) estimates (both from 2002 to 2007 and from that groundwater is contaminated beyond the WHO drinking 2007 to 2015) ; this can be used as a proxy of groundwater pumping that is independent of our InSAR measurements. water standard by a factor of 2 to 3 for the San Joaquin Valley. We chose not to include pesticide application as a predictor of groundwater Importantly, decreasing pumping below the threshold of arsenic contamination. Arsenic-bearing pesticides are largely restricted to lead inestastic aquifer compression will decrease arsenic concentra- arsenicals that were used predominantly on cotton and specific tree fruits (largely tions and the aquifers can recover to normal levels if over- apples) but have been banned since the 1970’s within the United States. Within the 15 4 aerated surface soils, arsenic exists as arsenate , and would have limited mobility . pumping is halted. Thus, subsidence maps produced by InSAR Thus, downward migration of surface applied arsenic is unlikely to contribute to provide a means of measuring the increased risk of arsenic groundwater contamination. Further, our analysis shows a correlation of contamination due to overpumping of aquifers, a critically groundwater arsenic with recent (last 10 years) subsidence and no correlation with important factor as groundwater is the main source of drinking older periods of subsidence. Our time series therefore further removes the possibility of arsenic pesticides impacting groundwater. water for roughly one million people in our study area alone . The arsenic levels, manganese levels and sulfate levels were obtained from Moreover, with a global trend toward increase use of ground- GAMA. These levels were assumed to have a lowest detectable value of 2 p.p.b. water, effectively managing water quality with quantity is from histogram analysis, so all values less than this were set to 2 p.p.b. As essential to preserve the use of this critical resource. measurements of arsenic, sulfate, and manganese vary greatly with high outliers, 4 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | | Predicted log (As) Predicted log (As) 10 10 Probability As exceeds WHO standards Predicted log (As) Predicted log (As) 10 10 Increase in predicted As concentration (p.p.b.) NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ARTICLE Table 1). The resulting values for mtry were both 5, with an MSE of 0.05 and 0.10, Table 1 Summary of scenes used for InSAR processing for the recent and historic random forest models, respectively. We used these calibrated, tuned random forest models to predict arsenic concentrations and establish relationships between the variables used in the model and arsenic Frame (region) Path No. of scenes No. of interferograms concentrations. Tulare 218 15 82 Random forests are one of many machine learning regression algorithms. In Fresno 218 17 71 this study, we also developed models using neural networks, support vector Mendota 219 19 82 machines, and boosted gradient trees. Our analysis showed that random forests provided the best fit to the observed data. However, similar relationships between Porterville 217 17 84 subsidence and arsenic were observed with these additional machine learning methods. the logarithm of each was used in the statistical model to dampen the effect of Variable importance. With random forests, the impact of each variable in the outliers. outcome can be assessed in multiple ways. One way is to randomly shuffle (per- The historic groundwater flow was calculated using hydraulic conductivity (K) mute) each variable, and observe the increase in MSE between predicted and estimates across the Central Valley from the aforementioned groundwater model . observed arsenic concentrations. The higher the MSE, the more important that ∂h We assumed that the historic groundwater gradient ( ) was equal to the land variable is to the outcome. Another way to assess variable importance is to use ∂l surface gradient . We then used Darcy’s equation to estimate flow: partial dependence plots. ∂h q ¼K ð1Þ Partial dependence plots. Partial dependence plots are a common way to visualize ∂l random forest models . The goal of a partial dependence plot is to show how varying one variable impacts the outcome of the prediction, while accounting for where q is the flux per unit area. all possible values from the other variables. The partial dependence plot is calcu- The distance to the nearest river was computed by determining the distance lated as shown below: from each well to all major rivers in the study area, and taking the minimum distance as the distance to the nearest river. fðxÞ¼ fðx; x Þ ð2Þ ic i¼1 InSAR processing. InSAR provides high-quality estimates of subsidence over large regions. InSAR measures subsidence over time with centimeter- to millimeter-scale where fðÞ x is the partial dependence, n is the number of rows of the dataset, x is accuracy over large regions with high (10 s to 100 s of meters) spatial resolution. the predictor variable of interest, x are the values of all other variables, and f is a We acquired 68 SAR scenes from the ALOS PALSAR sensor, covering the time ic function representing the random forest model The output can either be the best period from 2007 to 2010, over four frames (see Table 1) for InSAR processing estimate of log (As), or the probability that arsenic concentration exceeded 10 p.p. from the Alaska Satellite Facility (https://www.asf.alaska.edu/). We then processed 10 b. For the latter, we used all trees from the random forest model. Our random 319 interferograms using these scenes. We used 30 looks in azimuth and 10 looks forest model produced ‘best guess’ estimates of arsenic levels, which are the mean in range to reduce noise in the pixels, resulting in a pixel size of roughly ~ 250 m by of the estimates from all trees. It also produced an estimate for each of the 500 250 m. We unwrapped the interferograms using the snaphu code . Next, we used trees, for each observation. This provided us with a distribution which we used to the small baseline subset method to estimate the long-term deformation signal. determine the probability that the predicted arsenic levels exceeded the WHO This produced maps of the mean subsidence velocity for each of the four frames, standard. which we merged to create one map of mean subsidence velocity. Overlapping Random forest models are limited in their predictions to observed data. For this areas were averaged. We then applied a moving average filter to further smooth the reason, in the partial dependence plots of the main article (Figs. 2 and 3), we results, with a window size of 15 by 15 pixels. As the long-term signal dominated restricted the independent variable at the 5th and 95th percentile of each predictor the total signal, seasonal fluctuations are muted over a multi-year time period. variable. With those limits, 90% of the available data are displayed in the partial Thus, because of the temporal resolution of our arsenic measurements, we only dependence plots. This reduces the likelihood of presenting results which are considered the long-term signal. The resulting subsidence map is shown in Fig. 1 of unconstrained by data. In the Supplementary Information section, we presented all the paper. of the data, but displayed dashed lines indicating the 5th and 95th percentiles (Supplementary Figs. 1-4). Random forest model. Groundwater overpumping, as indicated by subsidence, is one of many complex mechanisms that influence groundwater arsenic con- Data availability. The geological, elevation, precipitation, temperature, ET, and tamiantion. In order to account for additional mechanisms, as well as the interplay water quality data have been cited throughout the paper, are listed in Supple- between various mechanisms, we employed the random forest model . Random mentary Table 1, and are publicly available. The SAR data used for processing forest models can account for nonlinear relationships between multiple variables interferograms are available from the Alaska Satellite Facility following registration. and handle outliers well . The random forest model creates ntree number of The datasets developed by the authors, i.e., the processed deformation maps, decision trees. Each decision tree makes an independent prediction of the variable estimated historic groundwater flow, and distance from each point to the nearest of interest, in our case arsenic concentration. The best estimate is the average of the river, as well as the code used to produce the results, are available upon request estimates from all decision trees. from the corresponding author. Each decision tree is built using a subset of the total data. The subset is selected using random sampling without replacement, or bagging. At each split (node), the variables to be considered for that node are also randomly sampled from the list of Received: 27 September 2017 Accepted: 26 April 2018 total variables. The values of the split are chosen to maximize the variance reduction, defined as the difference between the variance before the split and the sum of the variance of the points for which the split is true, and the variance of the points for which the split is false . The number of variables to be considered at each node is described by the variable mtry. The variables ntree and mtry are known as tuning parameters. The overall fit of the model is dependent on these tuning parameters. References In practice, increasing ntree both improves the accuracy of the results and 1. Margat, J. & Van der Gun, J. Groundwater around the world: a geographic increases the computational cost of the algorithm. Typically, increasing ntree does synopsis. CRC Press (2013). not significantly improve the results after roughly 100. To be conservative we chose 2. Faunt, C. C. et al. Groundwater Availability of the Central Valley Aquifer, a value of 500 for ntree. We determined the optimal value for mtry using a California. U.S. Geological Survey Professional Paper 1766 (USGS, 2009). validation dataset. We randomly selected 75% of the dataset to calibrate the 3. Welch, A. H., Helsel, D. R., Focazio, M. J. and Watkins, S. A. Arsenic in random forest model, and used the remaining 25% of the dataset as the validation ground water supplies of the United States. In Arsenic Exposure Health Effects to test the accuracy of the model. We chose the value for mtry that minimized the III, Proc. 3rd International Conference on Arsenic Exposure and Health Effects mean squared error (MSE) in the validation dataset. 9–17 (1999). We created two random forest models, one that predicted arsenic 4. Fendorf, S., Michael, H. A. & van Geen, A. Spatial and temporal variations of concentrations from 2007 to 2015 (recent) and one that predicted arsenic groundwater arsenic in South and Southeast Asia. Science 328, 1123–1127 concentrations from 1986 to 1993 (historic). The response variable was log (As). (2010). The variables used in calibrating both models were identical except that the recent 5. Yoshida, T., Yamauchi, H. & Sun, G. F. Chronic health effects in people subsidence from InSAR was not included in the historic arsenic concentrations exposed to arsenic via the drinking water: dose–response relationships in model. In addition, the concentrations of arsenic, manganese and sulfate used to review. Toxicol. Appl. Pharmacol. 198, 243–252 (2004). calibrate the models were taken from each respective time window (Supplementary NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 5 | | | ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 6. Smedley, P. L. & Kinniburgh, D. G. A review of the source, behaviour and differential SAR interferograms. In IEEE Int. Geoscience and Remote Sensing distribution of arsenic in natural waters. Appl. Geochem. 17, 517–568 (2002). Symposium 2375–2383 (2002). 7. Erban, L. E., Gorelick, S. M., Zebker, H. A. & Fendorf, S. Release of arsenic to 23. Breiman, L. Random forests. Mach. Learn. 45,5–32 (2001). deep groundwater in the Mekong Delta, Vietnam, linked to pumping-induced 24. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. Classification and land subsidence. Proc. Natl Acad. Sci. USA 110, 13751–13756 (2013). Regression Trees (Wiley, 1984). 8. Poland, J. F., Ireland, R. L., Lofgren, B. E. & Pugh, R. G. Land subsidence in the 25. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. San Joaquin Valley, California, as of 1972. U.S. Geological Survey Professional Springer Series in Statistics 2nd edn (Springer, 2009). Paper 437-H (USGS, 1975). 9. Smith, R. G. et al. Estimating the permanent loss of groundwater storage in the Acknowledgements Southern San Joaquin Valley, California. Water Resour. Res. 53, 2133–2148 R.S.’s contribution was supported by a National Science Foundation Fellowship (Award (2017). Number DGE-114747). S.F.’s contribution was supported, in part, by the US Department 10. Farr, T. G., C. Jones, Z. Liu. Progress Report: Subsidence in California, March of Energy, Office of Biological and Environmental Research, Subsurface Biogeochemistry 2015 – September 2016. NASA Progress Report, 1–37 (2016). Program (Award Number DE-SC0016544) and the SLAC SFA Project, FWP 10094). We 11. Terzaghi, K. Structure and Volume of Voids in Soils. Translated from thank Joseph Ayotte and Kenneth Belitz for providing feedback on arsenic concentration Erdbaummechanik auf Bodenphysikalischer Grundlage, From Theory to databases in California. Practice in Soil Mechanics (John Wiley and Sons, New York, 1925). 12. Page, R. W. Geology of the fresh ground-water basin of the Central Valley, California, with texture maps and sections. U.S. Geological Survey Professional Author contributions Paper 1401-C (USGS, 1986). R.S., R.K., and S.F. conceived of the study. R.S. acquired, processed, and analyzed the 13. Ayotte, J. D., Nolan, B. T. & Gronberg, J. A. Predicting arsenic in drinking data. R.S. developed and interpreted the models with input from S.F. and R.K. R.S., R.K., water wells of the Central Valley, California. Environ. Sci. Technol. 50, and S.F. wrote the manuscript. 7555–7563 (2016). 14. Fujii, R. & Swain, W. C. Areal distribution of selected trace elements, salinity, and major ions in shallow groundwater, Tulare Basin, Southern San Joaquin Additional information Valley, California. U.S. Geological Survey Water-Resources Investigations Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- Report 95-4048 (USGS, 1995). 018-04475-3. 15. Gao, S., Fujii, R., Chalmers, A. T. & Tanji, K.K Evaluation of adsorbed arsenic and potential contribution to shallow groundwater in Tulare Lake Bed Area, Competing interests: The authors declare no competing interests. Tulare Basin, California. Soil Sci. Soc. Am. J. 68,89–95 (2004). 16. Howard, T. Communities that rely on a contaminated groundwater source for Reprints and permission information is available online at http://npg.nature.com/ drinking water. State Water Resources Control Board Report to the reprintsandpermissions/ Legislature, https://www.waterboards.ca.gov/gama/ab2222/docs/ab2222.pdf (Sacramento, CA, 2013). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in 17. Belitz, K., Dubrovsky, N. M., Burow, K., Jurgens, B., & Johnson, T. Framework published maps and institutional affiliations. for a Ground-Water Quality Monitoring and Assessment Program for California. U.S. Geological Survey Water-Resources Investigations Report 03- 4166 (USGS, 2003). 18. Farr, T. G. et al. The shuttle radar topography mission. Rev. Geophys. 45,1–33 Open Access This article is licensed under a Creative Commons (2007). Attribution 4.0 International License, which permits use, sharing, 19. Williamson, A. K., Prudic, D. E. & Swain, L. A. Ground-Water Flow in the adaptation, distribution and reproduction in any medium or format, as long as you give Central Valley, California: Regional Aquifer-System Analysis—Central Valley, appropriate credit to the original author(s) and the source, provide a link to the Creative California. U.S. Geological Survey Professional Paper 1401-D (USGS, 1989). Commons license, and indicate if changes were made. The images or other third party 20. Anderson, M. C., Norman, J. M., Mecikalski, J. R., Otkin, J. A. and Kustas, W. material in this article are included in the article’s Creative Commons license, unless P. A climatological study of evapotranspiration and moisture stress across the indicated otherwise in a credit line to the material. If material is not included in the continental United States based on thermal remote sensing: 1. Model article’s Creative Commons license and your intended use is not permitted by statutory formulation. J. Geophys. Res. Atmos. 112, https://doi.org/10.1029/ regulation or exceeds the permitted use, you will need to obtain permission directly from 2006JD007506 (2007). the copyright holder. To view a copy of this license, visit http://creativecommons.org/ 21. Chen, C. W. & Zebker, H. A. Two-dimensional phase unwrapping with use of licenses/by/4.0/. statistical models for cost functions in nonlinear optimization. J. Opt. Soc. Am. A. 18, 338–351 (2001). 22. Berardino, P., Fornaro, G., Lanari, R. & Sansosti, E. A new algorithm for © The Author(s) 2018 monitoring localized deformation phenomena based on small baseline 6 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | | http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nature Communications Springer Journals

Overpumping leads to California groundwater arsenic threat

Free
6 pages

Loading next page...
 
/lp/springer_journal/overpumping-leads-to-california-groundwater-arsenic-threat-UUqbWr31e4
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Science, Humanities and Social Sciences, multidisciplinary; Science, Humanities and Social Sciences, multidisciplinary; Science, multidisciplinary
eISSN
2041-1723
D.O.I.
10.1038/s41467-018-04475-3
Publisher site
See Article on Publisher Site

Abstract

ARTICLE DOI: 10.1038/s41467-018-04475-3 OPEN Overpumping leads to California groundwater arsenic threat 1 1 2 Ryan Smith , Rosemary Knight & Scott Fendorf Water resources are being challenged to meet domestic, agricultural, and industrial needs. To complement finite surface water supplies that are being stressed by changes in precipitation and increased demand, groundwater is increasingly being used. Sustaining groundwater use requires considering both water quantity and quality. A unique challenge for groundwater use, as compared with surface water, is the presence of naturally occurring contaminants within aquifer sediments, which can enter the water supply. Here we find that recent groundwater pumping, observed through land subsidence, results in an increase in aquifer arsenic concentrations in the San Joaquin Valley of California. By comparison, historic groundwater pumping shows no link to current groundwater arsenic concentrations. Our results support the premise that arsenic can reside within pore water of clay strata within aquifers and is released due to overpumping. We provide a quantitative model for using subsidence as an indicator of arsenic concentrations correlated with groundwater pumping. 1 2 Department of Geophysics, Stanford University, 397 Panama Mall, Stanford, California 94305, USA. Department of Earth System Science, Stanford University, 473 Via Ortega, Stanford, California 94305, USA. Correspondence and requests for materials should be addressed to R.S. (email: rgsmith@stanford.edu) NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 1 | | | 1234567890():,; ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 lobally, groundwater provides almost half of all drinking quantitative link between overpumping of groundwater systems water , making it one of the world’s most important and arsenic contamination, a mechanism that has been proposed Gresources. Within the United States, the Central Valley of previously but never statistically demonstrated. California accounts for roughly 20% of groundwater with- drawals . The Central Valley is an arid region that supports a $17 billion agricultural industry. In the southern, highly productive Results and Discussion region of the valley (known as the San Joaquin Valley), aquifers Modeling arsenic concentration. We integrated estimates of are particularly stressed due to the high water demands. Further, subsidence with additional variables known from previous stu- groundwater is the main source of drinking water for roughly one dies to affect arsenic levels into a random forest model that million people in the San Joaquin Valley. Herein, we focus on accounts for nonlinear relationships, as well as inter- threats to groundwater quality induced by naturally occurring dependencies of different variables. Two models were devel- arsenic, which may have devastating impacts on both human oped, one that predicts recent (2007 to 2015) arsenic con- health and food production. centrations and a second that predicts historic (1986 to 1993) Arsenic is a ubiquitous, naturally occurring contaminant that is arsenic concentrations. These dates were chosen because they a common problem in many aquifers that are pumped for both span long droughts in the San Joaquin Valley, during which drinking water ; most notably, it is presently having a devasting time appreciable groundwater decline and subsidence occurred impact on groundwater quality throughout Asia . When present (Supplementary Fig. 5). We did not include the mild-drought in significant amounts, it increases the risk of cancer, heart dis- conditions from 1999 to 2005, because we had limited subsidence ease, and diabetes . Hazardous levels of arsenic typically result data over that period, and because we considered more intense from anaerobic conditions, as noted for the shallow, Holocene droughts to produce a stronger signal for our analysis. We used aquifers of Asia , or from high pH (pH > 8.5) often observed in the output of the random forest models to assess the effect of high arsenic regions of the Andes . In addition, overpumping an overpumping as indicated by subsidence on arsenic levels. We aquifer system has been noted to increase the arsenic levels in found that in both models, concurrent subsidence markedly Southeast Asia by drawing arsenic from less-permeable anaerobic increases the risk of arsenic contamination. As both models clay strata into the aquifer . establish similar relationships (capturing the time periods of Arsenic within pumped groundwater of the San Joaquin Valley recent and past periods of drought), we focus our discussion on has been noted for decades. Approximately 10% of the wells the recent arsenic concentration model, pointing out where key tested within the last 10 years have shown arsenic (As) con- differences exist. The full results of both models are shown in the centrations above 10 µg/L (p.p.b.), the level recommended as the Supplementary Information (Supplementary Figs. 1-4) maximum acceptable by the World Health Organization (WHO). As groundwater is increasingly being pumped to meet agri- cultural and domestic needs, preserving groundwater resources Mechanism for arsenic contamination from overpumping. The (i.e., maintaining water quality) is imperative. Here we seek to relationship between subsidence and arsenic concentration is determine the source of arsenic contamination within ground- linked to clay strata (Fig. 1). It has been noted that clay particles water of the Central Valley, CA, and to develop quantitative become enriched in arsenic due to their reactivity and high sur- means for predicting degradation of groundwater quality. We face area to volume ratio relative to sand-sized particles . Arsenic focus our study on the Tulare basin (Fig. 1), a highly productive has been transported to the San Joaquin Valley from the Sierra agricultural region of the San Joaquin Valley where groundwater Nevada and coastal mountain ranges by rivers, which cut through is essential for meeting water demands. arsenic-bearing formations, for millions of years . Clays at or Groundwater pumping in the San Joaquin Valley has caused near the surface at the time of deposition are the primary host of declines of ~ 60 m in groundwater levels over the past century, transported arsenic, which has been shown to adsorb on clay leading to subsidence as high as ~ 9 m from 1925 to 1970, a rate 15 surfaces in significant amounts in the San Joaquin Valley ,ina of 20 cm/year . Rates of subsidence as high as 25 cm/year have process similar to what occurs in other sedimentary basins, been observed in more recent droughts spanning from 2007 to 4 including those throughout Asia . As the clays are buried over 9, 10 2010 and from 2012 to 2016 . Subsidence due to groundwater geologic time, their restricted oxygen supply results in reduction pumping is caused by a pore pressure drop in aquifer materials, 14 of the arsenic at depths > 60 m , resulting in dissolution of the which increases the effecive stress, σ ,defined as σ = σ – P , e e T p arsenic within the clay pore water . This mechanism for arsenic where σ is the total stress and P is the pore pressure . 13 T p concentration in clays is supported by other studies in the area , Increasing the effective stress results in aquifer compaction. The who have found a positive relationship between aquifer clay majority of compaction and linked subsidence of the overlying content and arsenic concentration in the San Joaquin Valley. ground surface is caused by drainage of clays due to their weaker When unperturbed, groundwater within the aquifer primarily geomechanical properties, as indicated by their higher skeletal flows horizontally through the sediments with highest perme- specific storage. Thus, the high subsidence levels provide a ability, typically sands and gravels. Thus, pumped groundwater measure of overpumping in aquifers having substantial clay comes mostly from sands and gravels, which have lower arsenic content, such as the aquifers of the San Joaquin Valley. In this concentrations. When the aquifer system is stressed from region, the upper 500 m, which is the greatest depth typically overpumping, high vertical hydraulic gradients cause a larger drilled for groundwater pumping, consists of alternating layers of volume of water to be drawn from less-permeable clays, inducing sand, gravel, and clay. In general, the aquifer system is divided the release of water with high arsenic concentrations. As a into an upper aquifer, a thick clay confining unit known as the consequence, overpumping increases arsenic concentrations, as Corcoran clay, and a lower aquifer. The upper and lower aquifers noted for the lower Mekong Delta, Vietnam . Concomitant with contain sands and gravels, as well as numerous thin clay layers . decreased pore pressures is compaction of the aquifer and As a measure of overpumping, we use subsidence, derived resulting land subsidence. Thus, the dual consequence of from Interferometric Synthetic Aperture Radar (InSAR) data, to overpumping is land subsidence and increased extraction of pore predict arsenic levels quantitatively. water from clay layers, giving rise to a link between land We find that subsidence data have a strong correlation with subsidence and groundwater arsenic concentrations. concurrent arsenic concentrations. This demonstrates a 2 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | | 0 NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ARTICLE Land subsidence, 2007- 2010 (cm/year) 37°N Subsidence data extent Well arsenic concentrat- ion, 2007-2015 (p.p.b.) Corcoran clay extent 36°N 1000 121°W 120°W 119°W 118°W Fig. 1 Comparison of subsidence, arsenic concentration, and clay extent within the San Joaquin Valley, CA. Arsenic concentrations increase toward the center of the valley where subsidence is more extensive and a confining clay layer (known as the Corcoran clay) is present . The grayed-out region is the area where recent subsidence data (obtained from InSAR) were processed. Arsenic concentrations vary by orders of magnitude and are thus shown with a logarithmic color bar (see Supplementary Fig. 8 for arsenic concentration histograms). The arsenic concentration data points were sourced from . The basemap was created using elevation data from the Shuttle Radar Topography Mission Quantifying the impact of overpumping on groundwater easily seen in InSAR data, should improve water quality, at least arsenic concentrations. We highlighted four of the most for the San Joaquin Valley. important variables describing arsenic concentration within the The historic arsenic concentration model shows a weaker, but Tulare Basin in the recent model, shown in Fig. 2a-d. Of these, still positive, correlation between Corcoran clay thickness and the thickness of the Corcoran Clay (a confining unit that overlies arsenic concentration, as well as manganese concentration and a lower aquifer) shows a positive correlation with arsenic con- arsenic concentration. Due to the dynamic nature of groundwater centrations due to increased clay content. Elevation has a negative chemistry, the relative importance of different mechanisms for correlation, as lower areas are more likely to have been water- arsenic contamination change over time, highlighting the need for saturated and thus anaerobic. A positive correlation was found regular calibration with up to date datasets. between log (Mn) and arsenic concentrations, as the presence of To further quantify the impact of overpumping on arsenic manganese indicates an anoxic environment, in which arsenic concentrations, we simulated three different hypothetical scenar- tends to be more soluble. Significantly, recent subsidence from ios with our predictive random forest model: oxic, suboxic, and InSAR showed a positive correlation, as overpumping leads to anoxic subsurface conditions. We used a variation of the partial increased pore water drainage from clays. The first three variables dependence plot that averages all variables except subsidence, are well-known from the literature and not related to human which is varied from 0 to 10 cm/year, and log (Mn), which is set activity. The quantitative link between pumping-induced sub- at 0, 1, and 2 for oxic, suboxic, and anoxic scenarios, respectively, sidence and arsenic concentrations has not been shown before, to estimate the probability that the predicted arsenic levels and is directly related to human activity. exceeded the WHO’s standard of 10 µg/L arsenic (Fig. 3). At ca. 8 By comparing the historic and recent arsenic concentration cm/year subsidence, the probability that arsenic exceeds the models, we see the transient nature of the relationship between WHO standard dramatically accelerates. This inflection point is pumping-induced subsidence and arsenic concentrations (Fig. 2e). also present at roughly the same location in the historic arsenic Although historic subsidence has a high impact on historic model (Fig. 2e). As inelastic deformation of clay beds begins, the arsenic concentrations, it has virtually no impact on recent rate of subsidence increases significantly . As inelastic deforma- arsenic concentrations. A lack of correlation between present tion occurs when pore pressure drops below the lowest level arsenic concentrations and historic subsidence (related to over- previously experienced, it releases pore water that has not been pumping) suggests that arsenic levels due to overpumping slowly mixed with the main aquifer previously, which in the present case return to their original levels after the groundwater pumping is results in arsenic-rich pore water being captured within the decreased, implying that arsenic is flushed from the auqifer. Thus, pumped groundwater. our findings indicated that avoiding overpumping of aquifers, NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 3 | | | Elevation (m) 0 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ab e 1.1 1.1 Predicted historic arsenic Predicted recent arsenic 0.9 0.9 2.0 0.7 0.7 0.5 0.5 1.5 0 10 20 30 40 0.0 0.5 1.0 1.5 2.0 2.5 Corcoran clay thickness (m) log (Mn) (p.p.b.) 1.0 cd 1.1 1.1 0.9 0.9 0.5 0.7 0.7 0.5 0.5 0.0 60 80 100 120 140 2.5 5.0 7.5 10.0 5 10 15 20 Elevation (m) Recent subsidence (cm/year) Modeled subsidence from 1986 to 1993 (cm/year) Fig. 2 Partial dependence plots of primary descriptors of groundwater arsenic concentrations. These include a confining clay thickness, b dissolved Mn concentrations, c elevation, and d recent subsidence. e Comparison of partial dependence plots of historic and recent subsidence. Note that historic subsidence has a great effect on historic arsenic levels but little effect on recent arsenic levels (e), whereas recent subsidence has a great effect on recent arsenic levels (d). All blue lines are derived from models predicting recent (2007–2015) arsenic concentrations, whereas the red line is derived from models predicting historic (1986–1993) arsenic concentrations Oxidation state 1 Oxic 2 Suboxic 3 Anoxic Methods Data acquisition and processing. There are many factors that control natural 0.3 variation in arsenic concentration. We developed a statistical model using numerous datasets to asses the role each plays as a predictor of arsenic con- centrations. We obtained arsenic level data, as well as proxy information on redox potential from manganese and sulfate concentrations, from 838 wells over the 2007 to 2015 time frame, and 424 wells over the 1986 to 1993 time frame, from the 0.2 GAMA database , which monitors groundwater quality in California (http:// geotracker.waterboards.ca.gov/gama/datadownload). We used a previously devel- oped groundwater model which provided us with several useful predictors of arsenic concentrations: percent of total water use from groundwater, which is directly related to pumping; total aquifer clay content, which sets an upper limit on 0.1 available arsenic from clays; top and bottom of perforated well interval, both of which indicate the depth from which pumping occurred, a soft indicator of the oxidation state; subsidence from 1962 to 1976 and subsidence from 1986 to 1993, which provide historic measures of pumping-induced clay drainage. Elevation was 036 9 also used as a predictor, obtained from the Shuttle Radar Topography Mission ; Land subsidence, 2007 to 2010 (cm/year) this is an indicator of closed-basin conditions, where contaminants such as arsenic tend to be concentrated. Slope was also derived from this dataset. As slope closely Fig. 3 Probability that arsenic levels exceeded the WHO standard. This was tracks predevelopment head , we used this with the clay content to estimate calculated based on recent subsidence for each arsenic risk category historic groundwater flow, which was also included as a predictor; our assumption being that enhanced groundwater flow would flush the system over time and reduce arsenic concentrations. We used average temperature data from PRISM; temperature variations have been related to changes in arsenic in previous work . Land subsidence due to overpumping increases the probability We also used evapotranspiration (ET) estimates (both from 2002 to 2007 and from that groundwater is contaminated beyond the WHO drinking 2007 to 2015) ; this can be used as a proxy of groundwater pumping that is independent of our InSAR measurements. water standard by a factor of 2 to 3 for the San Joaquin Valley. We chose not to include pesticide application as a predictor of groundwater Importantly, decreasing pumping below the threshold of arsenic contamination. Arsenic-bearing pesticides are largely restricted to lead inestastic aquifer compression will decrease arsenic concentra- arsenicals that were used predominantly on cotton and specific tree fruits (largely tions and the aquifers can recover to normal levels if over- apples) but have been banned since the 1970’s within the United States. Within the 15 4 aerated surface soils, arsenic exists as arsenate , and would have limited mobility . pumping is halted. Thus, subsidence maps produced by InSAR Thus, downward migration of surface applied arsenic is unlikely to contribute to provide a means of measuring the increased risk of arsenic groundwater contamination. Further, our analysis shows a correlation of contamination due to overpumping of aquifers, a critically groundwater arsenic with recent (last 10 years) subsidence and no correlation with important factor as groundwater is the main source of drinking older periods of subsidence. Our time series therefore further removes the possibility of arsenic pesticides impacting groundwater. water for roughly one million people in our study area alone . The arsenic levels, manganese levels and sulfate levels were obtained from Moreover, with a global trend toward increase use of ground- GAMA. These levels were assumed to have a lowest detectable value of 2 p.p.b. water, effectively managing water quality with quantity is from histogram analysis, so all values less than this were set to 2 p.p.b. As essential to preserve the use of this critical resource. measurements of arsenic, sulfate, and manganese vary greatly with high outliers, 4 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | | Predicted log (As) Predicted log (As) 10 10 Probability As exceeds WHO standards Predicted log (As) Predicted log (As) 10 10 Increase in predicted As concentration (p.p.b.) NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 ARTICLE Table 1). The resulting values for mtry were both 5, with an MSE of 0.05 and 0.10, Table 1 Summary of scenes used for InSAR processing for the recent and historic random forest models, respectively. We used these calibrated, tuned random forest models to predict arsenic concentrations and establish relationships between the variables used in the model and arsenic Frame (region) Path No. of scenes No. of interferograms concentrations. Tulare 218 15 82 Random forests are one of many machine learning regression algorithms. In Fresno 218 17 71 this study, we also developed models using neural networks, support vector Mendota 219 19 82 machines, and boosted gradient trees. Our analysis showed that random forests provided the best fit to the observed data. However, similar relationships between Porterville 217 17 84 subsidence and arsenic were observed with these additional machine learning methods. the logarithm of each was used in the statistical model to dampen the effect of Variable importance. With random forests, the impact of each variable in the outliers. outcome can be assessed in multiple ways. One way is to randomly shuffle (per- The historic groundwater flow was calculated using hydraulic conductivity (K) mute) each variable, and observe the increase in MSE between predicted and estimates across the Central Valley from the aforementioned groundwater model . observed arsenic concentrations. The higher the MSE, the more important that ∂h We assumed that the historic groundwater gradient ( ) was equal to the land variable is to the outcome. Another way to assess variable importance is to use ∂l surface gradient . We then used Darcy’s equation to estimate flow: partial dependence plots. ∂h q ¼K ð1Þ Partial dependence plots. Partial dependence plots are a common way to visualize ∂l random forest models . The goal of a partial dependence plot is to show how varying one variable impacts the outcome of the prediction, while accounting for where q is the flux per unit area. all possible values from the other variables. The partial dependence plot is calcu- The distance to the nearest river was computed by determining the distance lated as shown below: from each well to all major rivers in the study area, and taking the minimum distance as the distance to the nearest river. fðxÞ¼ fðx; x Þ ð2Þ ic i¼1 InSAR processing. InSAR provides high-quality estimates of subsidence over large regions. InSAR measures subsidence over time with centimeter- to millimeter-scale where fðÞ x is the partial dependence, n is the number of rows of the dataset, x is accuracy over large regions with high (10 s to 100 s of meters) spatial resolution. the predictor variable of interest, x are the values of all other variables, and f is a We acquired 68 SAR scenes from the ALOS PALSAR sensor, covering the time ic function representing the random forest model The output can either be the best period from 2007 to 2010, over four frames (see Table 1) for InSAR processing estimate of log (As), or the probability that arsenic concentration exceeded 10 p.p. from the Alaska Satellite Facility (https://www.asf.alaska.edu/). We then processed 10 b. For the latter, we used all trees from the random forest model. Our random 319 interferograms using these scenes. We used 30 looks in azimuth and 10 looks forest model produced ‘best guess’ estimates of arsenic levels, which are the mean in range to reduce noise in the pixels, resulting in a pixel size of roughly ~ 250 m by of the estimates from all trees. It also produced an estimate for each of the 500 250 m. We unwrapped the interferograms using the snaphu code . Next, we used trees, for each observation. This provided us with a distribution which we used to the small baseline subset method to estimate the long-term deformation signal. determine the probability that the predicted arsenic levels exceeded the WHO This produced maps of the mean subsidence velocity for each of the four frames, standard. which we merged to create one map of mean subsidence velocity. Overlapping Random forest models are limited in their predictions to observed data. For this areas were averaged. We then applied a moving average filter to further smooth the reason, in the partial dependence plots of the main article (Figs. 2 and 3), we results, with a window size of 15 by 15 pixels. As the long-term signal dominated restricted the independent variable at the 5th and 95th percentile of each predictor the total signal, seasonal fluctuations are muted over a multi-year time period. variable. With those limits, 90% of the available data are displayed in the partial Thus, because of the temporal resolution of our arsenic measurements, we only dependence plots. This reduces the likelihood of presenting results which are considered the long-term signal. The resulting subsidence map is shown in Fig. 1 of unconstrained by data. In the Supplementary Information section, we presented all the paper. of the data, but displayed dashed lines indicating the 5th and 95th percentiles (Supplementary Figs. 1-4). Random forest model. Groundwater overpumping, as indicated by subsidence, is one of many complex mechanisms that influence groundwater arsenic con- Data availability. The geological, elevation, precipitation, temperature, ET, and tamiantion. In order to account for additional mechanisms, as well as the interplay water quality data have been cited throughout the paper, are listed in Supple- between various mechanisms, we employed the random forest model . Random mentary Table 1, and are publicly available. The SAR data used for processing forest models can account for nonlinear relationships between multiple variables interferograms are available from the Alaska Satellite Facility following registration. and handle outliers well . The random forest model creates ntree number of The datasets developed by the authors, i.e., the processed deformation maps, decision trees. Each decision tree makes an independent prediction of the variable estimated historic groundwater flow, and distance from each point to the nearest of interest, in our case arsenic concentration. The best estimate is the average of the river, as well as the code used to produce the results, are available upon request estimates from all decision trees. from the corresponding author. Each decision tree is built using a subset of the total data. The subset is selected using random sampling without replacement, or bagging. At each split (node), the variables to be considered for that node are also randomly sampled from the list of Received: 27 September 2017 Accepted: 26 April 2018 total variables. The values of the split are chosen to maximize the variance reduction, defined as the difference between the variance before the split and the sum of the variance of the points for which the split is true, and the variance of the points for which the split is false . The number of variables to be considered at each node is described by the variable mtry. The variables ntree and mtry are known as tuning parameters. The overall fit of the model is dependent on these tuning parameters. References In practice, increasing ntree both improves the accuracy of the results and 1. Margat, J. & Van der Gun, J. Groundwater around the world: a geographic increases the computational cost of the algorithm. Typically, increasing ntree does synopsis. CRC Press (2013). not significantly improve the results after roughly 100. To be conservative we chose 2. Faunt, C. C. et al. Groundwater Availability of the Central Valley Aquifer, a value of 500 for ntree. We determined the optimal value for mtry using a California. U.S. Geological Survey Professional Paper 1766 (USGS, 2009). validation dataset. We randomly selected 75% of the dataset to calibrate the 3. Welch, A. H., Helsel, D. R., Focazio, M. J. and Watkins, S. A. Arsenic in random forest model, and used the remaining 25% of the dataset as the validation ground water supplies of the United States. In Arsenic Exposure Health Effects to test the accuracy of the model. We chose the value for mtry that minimized the III, Proc. 3rd International Conference on Arsenic Exposure and Health Effects mean squared error (MSE) in the validation dataset. 9–17 (1999). We created two random forest models, one that predicted arsenic 4. Fendorf, S., Michael, H. A. & van Geen, A. Spatial and temporal variations of concentrations from 2007 to 2015 (recent) and one that predicted arsenic groundwater arsenic in South and Southeast Asia. Science 328, 1123–1127 concentrations from 1986 to 1993 (historic). The response variable was log (As). (2010). The variables used in calibrating both models were identical except that the recent 5. Yoshida, T., Yamauchi, H. & Sun, G. F. Chronic health effects in people subsidence from InSAR was not included in the historic arsenic concentrations exposed to arsenic via the drinking water: dose–response relationships in model. In addition, the concentrations of arsenic, manganese and sulfate used to review. Toxicol. Appl. Pharmacol. 198, 243–252 (2004). calibrate the models were taken from each respective time window (Supplementary NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications 5 | | | ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04475-3 6. Smedley, P. L. & Kinniburgh, D. G. A review of the source, behaviour and differential SAR interferograms. In IEEE Int. Geoscience and Remote Sensing distribution of arsenic in natural waters. Appl. Geochem. 17, 517–568 (2002). Symposium 2375–2383 (2002). 7. Erban, L. E., Gorelick, S. M., Zebker, H. A. & Fendorf, S. Release of arsenic to 23. Breiman, L. Random forests. Mach. Learn. 45,5–32 (2001). deep groundwater in the Mekong Delta, Vietnam, linked to pumping-induced 24. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. Classification and land subsidence. Proc. Natl Acad. Sci. USA 110, 13751–13756 (2013). Regression Trees (Wiley, 1984). 8. Poland, J. F., Ireland, R. L., Lofgren, B. E. & Pugh, R. G. Land subsidence in the 25. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. San Joaquin Valley, California, as of 1972. U.S. Geological Survey Professional Springer Series in Statistics 2nd edn (Springer, 2009). Paper 437-H (USGS, 1975). 9. Smith, R. G. et al. Estimating the permanent loss of groundwater storage in the Acknowledgements Southern San Joaquin Valley, California. Water Resour. Res. 53, 2133–2148 R.S.’s contribution was supported by a National Science Foundation Fellowship (Award (2017). Number DGE-114747). S.F.’s contribution was supported, in part, by the US Department 10. Farr, T. G., C. Jones, Z. Liu. Progress Report: Subsidence in California, March of Energy, Office of Biological and Environmental Research, Subsurface Biogeochemistry 2015 – September 2016. NASA Progress Report, 1–37 (2016). Program (Award Number DE-SC0016544) and the SLAC SFA Project, FWP 10094). We 11. Terzaghi, K. Structure and Volume of Voids in Soils. Translated from thank Joseph Ayotte and Kenneth Belitz for providing feedback on arsenic concentration Erdbaummechanik auf Bodenphysikalischer Grundlage, From Theory to databases in California. Practice in Soil Mechanics (John Wiley and Sons, New York, 1925). 12. Page, R. W. Geology of the fresh ground-water basin of the Central Valley, California, with texture maps and sections. U.S. Geological Survey Professional Author contributions Paper 1401-C (USGS, 1986). R.S., R.K., and S.F. conceived of the study. R.S. acquired, processed, and analyzed the 13. Ayotte, J. D., Nolan, B. T. & Gronberg, J. A. Predicting arsenic in drinking data. R.S. developed and interpreted the models with input from S.F. and R.K. R.S., R.K., water wells of the Central Valley, California. Environ. Sci. Technol. 50, and S.F. wrote the manuscript. 7555–7563 (2016). 14. Fujii, R. & Swain, W. C. Areal distribution of selected trace elements, salinity, and major ions in shallow groundwater, Tulare Basin, Southern San Joaquin Additional information Valley, California. U.S. Geological Survey Water-Resources Investigations Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- Report 95-4048 (USGS, 1995). 018-04475-3. 15. Gao, S., Fujii, R., Chalmers, A. T. & Tanji, K.K Evaluation of adsorbed arsenic and potential contribution to shallow groundwater in Tulare Lake Bed Area, Competing interests: The authors declare no competing interests. Tulare Basin, California. Soil Sci. Soc. Am. J. 68,89–95 (2004). 16. Howard, T. Communities that rely on a contaminated groundwater source for Reprints and permission information is available online at http://npg.nature.com/ drinking water. State Water Resources Control Board Report to the reprintsandpermissions/ Legislature, https://www.waterboards.ca.gov/gama/ab2222/docs/ab2222.pdf (Sacramento, CA, 2013). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in 17. Belitz, K., Dubrovsky, N. M., Burow, K., Jurgens, B., & Johnson, T. Framework published maps and institutional affiliations. for a Ground-Water Quality Monitoring and Assessment Program for California. U.S. Geological Survey Water-Resources Investigations Report 03- 4166 (USGS, 2003). 18. Farr, T. G. et al. The shuttle radar topography mission. Rev. Geophys. 45,1–33 Open Access This article is licensed under a Creative Commons (2007). Attribution 4.0 International License, which permits use, sharing, 19. Williamson, A. K., Prudic, D. E. & Swain, L. A. Ground-Water Flow in the adaptation, distribution and reproduction in any medium or format, as long as you give Central Valley, California: Regional Aquifer-System Analysis—Central Valley, appropriate credit to the original author(s) and the source, provide a link to the Creative California. U.S. Geological Survey Professional Paper 1401-D (USGS, 1989). Commons license, and indicate if changes were made. The images or other third party 20. Anderson, M. C., Norman, J. M., Mecikalski, J. R., Otkin, J. A. and Kustas, W. material in this article are included in the article’s Creative Commons license, unless P. A climatological study of evapotranspiration and moisture stress across the indicated otherwise in a credit line to the material. If material is not included in the continental United States based on thermal remote sensing: 1. Model article’s Creative Commons license and your intended use is not permitted by statutory formulation. J. Geophys. Res. Atmos. 112, https://doi.org/10.1029/ regulation or exceeds the permitted use, you will need to obtain permission directly from 2006JD007506 (2007). the copyright holder. To view a copy of this license, visit http://creativecommons.org/ 21. Chen, C. W. & Zebker, H. A. Two-dimensional phase unwrapping with use of licenses/by/4.0/. statistical models for cost functions in nonlinear optimization. J. Opt. Soc. Am. A. 18, 338–351 (2001). 22. Berardino, P., Fornaro, G., Lanari, R. & Sansosti, E. A new algorithm for © The Author(s) 2018 monitoring localized deformation phenomena based on small baseline 6 NATURE COMMUNICATIONS (2018) 9:2089 DOI: 10.1038/s41467-018-04475-3 www.nature.com/naturecommunications | | |

Journal

Nature CommunicationsSpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off