Abstract Objectives To design a method that uses preliminary hazard mapping data to optimize the number and location of sensors within a network for a long-term assessment of occupational concentrations, while preserving temporal variability, accuracy, and precision of predicted hazards. Methods Particle number concentrations (PNCs) and respirable mass concentrations (RMCs) were measured with direct-reading instruments in a large heavy-vehicle manufacturing facility at 80–82 locations during 7 mapping events, stratified by day and season. Using kriged hazard mapping, a statistical approach identified optimal orders for removing locations to capture temporal variability and high prediction precision of PNC and RMC concentrations. We compared optimal-removal, random-removal, and least-optimal-removal orders to bound prediction performance. Results The temporal variability of PNC was found to be higher than RMC with low correlation between the two particulate metrics (ρ = 0.30). Optimal-removal orders resulted in more accurate PNC kriged estimates (root mean square error [RMSE] = 49.2) at sample locations compared with random-removal order (RMSE = 55.7). For estimates at locations having concentrations in the upper 10th percentile, the optimal-removal order preserved average estimated concentrations better than random- or least-optimal-removal orders (P < 0.01). However, estimated average concentrations using an optimal-removal were not statistically different than random-removal when averaged over the entire facility. No statistical difference was observed for optimal- and random-removal methods for RMCs that were less variable in time and space than PNCs. Conclusions Optimized removal performed better than random-removal in preserving high temporal variability and accuracy of hazard map for PNC, but not for the more spatially homogeneous RMC. These results can be used to reduce the number of locations used in a network of static sensors for long-term monitoring of hazards in the workplace, without sacrificing prediction performance. hazard mapping, kriging, optimization, particulate matter, sampling Introduction Hazard mapping is a powerful research tool to depict relative levels of harmful physical or chemical workplace agents (Koehler and Volckens, 2011). In a hazard mapping event, technicians measure hazard concentrations with direct-reading instruments (DRIs) over a short time (1–5 min) as they move to predetermined locations throughout a facility. A map is generated from these data to depict the spatial distribution of the hazard. Multiple mapping events may be used across time of day or seasons to understand temporal variability. Mapping provides a representation of spatial or temporal trends that is otherwise difficult to depict in complex occupational settings (Park et al., 2010). Hazard maps are used for a variety of purposes, including risk classification and informing regulatory compliance. They have helped isolate the source of hazards within a workplace (O’Brien, 2003), assess the efficacy of technological controls (Dasch et al., 2005), and characterize the distribution of toxins across space and time (Peters et al., 2006, 2012). In some cases, a hazard map is a pre-survey tool to identify optimal regions for more comprehensive exposure assessment (Dasch et al., 2005). Several methods are used to map occupational hazards. The most basic approach focuses on sampling hazards at worker locations without predicting exposures elsewhere (Chen et al., 2009). Other methods rely on automated software where inputted locations and concentrations produce hazard contour maps (O’Brien, 2003; Liu and Hammond, 2010). A more complex approach utilizes the geostatistical kriging method to predict concentrations at unsampled locations. This approach relies on characterizing spatial structure using a variogram function to weight nearby locations more heavily than those farther away (Cressie, 1993). Advantages of kriging are (i) a best linear and unbiased prediction and (ii) an error term allowing the quantification of prediction variability (Diggle, 1998). Kriging has been applied in assessments for industrial noise (Lake et al., 2015), aerosols (Peters et al., 2006), temperature, and gases (Peters et al., 2012). A fundamental issue with hazard mapping is that sampled data may not present a complete or representative picture of true exposures (Koehler and Volckens, 2011). Industrial processes, facility structures, and location, season, or time of sampling day, may introduce uncertainty in the predicted map (Peters et al., 2006; Heitbrink et al., 2007; Clerc and Vincent, 2014; Lake et al., 2015). Two major factors that drive uncertainties in the estimation of hazard maps are related to spatial and temporal variability of measured exposures. Spatial uncertainty is directly related to data collection, where inherently limited samples necessitate interpolations at unmeasured locations. We can minimize spatial uncertainty by increasing the sample size or targeting specific areas, which is especially critical when aerosols, gases, and other hazards display spatial variation (Beelen et al., 2009; Peters et al., 2012). Temporal uncertainty is associated with concentration variability across time. A short-duration measurement may not adequately capture the range of concentrations to assess long-term risk (Koehler et al., 2017). Temporal variability plays an important role in the error rate of predicted hazards, and it was recommended regions of high temporal variability be targeted with repeated measurements during a sampling campaign (Lake et al., 2015). Dasch et al. (2005) and Lake et al. (2015) applied a multi-tiered approach to reduce spatial and temporal uncertainties. The multi-tiered approach uses hazard mapping to capture spatial trends repeated across time of day or seasons to understand temporal variability. Then based on the mapped hazards, multiple, lower-cost, static monitors are deployed in areas with high hazard intensity or temporal variability for long-term assessment (Park et al., 2010). Extending this concept, a low-cost network of sensors could be used within a hazard-monitoring network to obtain time series of hazard maps. However, an unresolved issue is how to select the minimum number of sensors and their locations to best quantify spatial and temporal uncertainty (Koehler and Volckens, 2011; Koehler and Peters, 2012). The objective of this research was to design a method to optimize a network of stationary sensors for long-term exposure assessment. We focused on a heavy-vehicle manufacturing facility, where particle number concentrations (PNCs) and respirable mass concentrations (RMCs) were measured by hazard mapping 7-times, stratified by day and season. We devised a rubric to systematically remove locations for long-term deployment of low-cost sensors, while preserving the ability of maps to reflect concentrations and temporal variability with greater accuracy and precision. This research presents a unique method to assist researchers design ideal sampling networks for future exposure assessments of occupational hazards. The method is applicable to any spatially distributed hazard, although our example used particulates to demonstrate the approach. Methods Hazard mapping In 2015, a hazard mapping campaign was conducted to collect preliminary data on particulate hazards within a large area of a heavy-vehicle manufacturing facility (81 500 m2; ~877 000 ft2). In this area, torches or lasers cut parts from large metal sheets. Welders then manually tacked these parts, and robots completed the welding process to form vehicle components. Features in the components required for later assembly (e.g. threaded holes) were machined in automatic machining centers. Except for the laser cutter machines that were fully enclosed and ventilated, the factory relied on general ventilation. To assess daily and seasonal variability, we conducted seven mapping events: two morning and afternoon events in January (n = 4); and two morning and one afternoon event in May (n = 3). Following Peters et al. (2006), each mapping event consisted of measuring PNCs (per cm3) and RMCs (mg/m3) at 82 possible locations. Using two direct-reading sensors placed on separate mobile carts, each device sampled PNC and RMC for 1-min intervals at unique locations. Concentrations were averaged at each location, and the carts were moved to the next location until measurements were obtained at all locations (~2 h). The samplers were robustly deployed and locations chosen a priori to maximize spatial coverage within the cutting, welding, and machining areas of the facility. Sample sites were restricted to locations accessible to the study team, which limited us to somewhat regular interval distances and reduced the number of locations visited during some campaigns: all sites were visited in 4 campaigns; 81 sites were visited in 1 campaign; and 80 were visited in 2 campaigns. For each mapping event, we produced a hazard map of estimated concentrations using kriging to interpolate best linear and unbiased predictions of PNC and RMC at unsampled locations. The kriging function is Z^(s0)=∑i=1nλiZ(si) (1) where Z(s0) represents the predicted value at unobserved location s0, i the sampled data z(s1),z(s2),…,z(sn), the spatial weights represented by λi with a minimized prediction variance depending on the spatial autocorrelation structure defined by the variogram (Cressie, 1993). Residuals are dependent by parameterizing spatial correlation as a decreasing function of distance (Diggle, 1998). Spatial correlation parameters (range, sill, and nugget) were defined using a spherical semivariogram function fit for each hazard mapping event and pollutant. Koehler and Peters (2012) provide a description of kriging for occupational settings. To investigate the temporal variability of pollutants, we aggregated our data by season (January versus May) and for all days combined. We estimated the standard deviation (SD) of the measured concentrations at each sample location. Greater values indicate locations with large daily and/or seasonal variability, and small values indicate locations with low daily and/or seasonal variability. To evaluate temporal variation in space, we used semivariograms to determine the spatial dependence structure of the estimated SD. Kriging was applied to predict SDs of PNCs and RMCs. To compare heterogeneity of across the predictions of the two pollutants, we used a Fligner-Killeen test on the kriged PNC and RMC SDs (Bottenheim et al., 1994). Since the pollutants have different scales, we normalized each grid cell as a proportion of the summed SD’s. A two-sided Student's t-test was used to compare if the predicted SD’s showed statistically significant differences across warm versus cold seasons and compare prediction performance across removal orders. Sample location removal orders The preliminary mapping campaign was performed with a limited number of accurate, but high-cost, DRIs. While the measurements are spatially robust, this approach would be time- and cost-prohibitive for long-term sampling. We devised a methodology to determine which locations could be removed from the 82-location mapping campaign, while maintaining accuracy and precision of the resulting hazard maps. We evaluated three methods: (i) optimal-removal order, to produce the best hazard map; (ii) random-removal order; and (iii) least-optimal-removal order, to produce the worst hazard map. The best hazard map was defined as one that prioritizes locations with 1) high standard deviations (e.g. large temporal variability in measured concentrations) and 2) high prediction precision (e.g. low kriging variance). Thus, an optimal-removal order method first removes locations contributing least to these characteristics. A novel iterative approach was designed using the 7-measurement SDs at the 82 sample locations to identify optimal locations for removal. As an overview, we applied a leave-one-out method that starts with all sample sites, temporarily removes a single data point (in this case, location i from n sensor locations, where n is the total number of sites), and performs calculations on the reduced data set. The removed data point i is then added back and a different data point temporarily removed with calculations re-run. As each location i was removed, we performed the following steps: Step 1. Krige the facility surface using the SD values at the n − i sample sites and, a. Average the predicted SD across the entire surface, Z¯l= 1n∑j=1nZ^(s0)−ith (2) where Z¯l represents the mean kriged SD estimate with the ith sensor location removed. b. Average the kriging variance of the predicted SD (e.g. the uncertainty of kriged values) across the entire surface Zσl¯= 1n∑j=1nσ2(s0)−ith (3) where Zσl¯ represents the mean kriging variance with the ith sensor location removed c. Repeat (1a) and (1b) until all n locations have been ‘left out’ Step 2. Out of the n values in Step 1, identify the maximum and minimum from Step 1a ( Z¯l max; Z¯i min) and the maximum and minimum for Step 1b (Z¯σi max; Z¯σi min) Step 3. For each ith of n removed locations, calculate ϕi= (Z¯l−Z¯i min)(Z¯l max−Z¯l min)¯ + |(Zσi−Z¯σi min)(Z¯σi max−Z¯σi min)−1| (4) where a proportion closer to 1 for [(Z¯l−Z¯i min)/(Z¯l max−Z¯l min)¯]indicates that removing the ith location results in a kriged surface with higher average predicted SD values. For |(Zσi−Z¯σi min)(Z¯σi max−Z¯σi min)−1| a value closer to 1 indicates that removing the ith location results in a kriged surface with lower prediction variance (e.g. better precision). When the values are added ( ϕi), the ith location with the largest summed value represents the optimal sample location to remove, as its removal will result in a hazard map with high SD predictions and low kriging variance. This ith location is then permanently removed from the pool of sample locations. Step 4. We repeated Steps 1–3 with the iteratively reduced sensor locations, until 75% of locations (62 sensors) had been permanently removed. The 75% threshold was selected because 20 or less total data points would not provide adequate statistical power for a kriging assessment. A least-optimal-removal order method followed a similar procedure, except in Step 3 the ϕi with the lowest value will be the sample location that is removed. The resulting map will then represent low SD predictions and high kriging variance. This ith sample location represents the most influential spot for hazard mapping and would be considered ‘least-optimal’ to remove. A least-optimal-removal order would result in ‘the worst’ hazard map, which while not practical for real-world applications, represents a useful comparison tool for demonstrating the impact of removing influential sample sites. The optimal- and least-optimal-removal orders were separately determined for PNC and RMC hazards. Further statistical description of the kriging approach can be found in Diggle (1998) and Waller and Gotway (2004). To evaluate the prediction performance when sample locations are randomly removed, we created 10 unique sets of ‘random removal orders.’ The multiple sets are to ensure a range of possible outcomes when sites are randomly selected. Prediction performance following sample location removal To investigate prediction accuracy, we used kriging to predict PNC and RMC at sample locations following their removal. We evaluated the accuracy of these estimates using root mean square error (RMSE) RMSE= 1n∑i=1n(zi−ẑi)2 (5) where zi denotes the measured value at location i and ẑi is the predicted value at location i, and n represents the 82 sample locations. The RMSE quantifies the mean difference between measured and predicted concentration, where smaller RMSE represents a more accurate prediction of true concentrations. Using either an optimal- or random-removal order, locations were removed during each of 7-hazard mapping events. At each iteration, variograms were re-estimated with the reduced data and the average RMSE at removed locations was calculated. To evaluate full-facility hazard maps, we kriged PNC and RMC when removing sample locations under optimal-removal, random-removal, and least-optimal-removal orders. We estimated kriged concentrations and variances at a fine spatial grid (31488 cells; roughly 1.5 × 1.5m) covering the facility floor. For each grid cell, we calculated the percent change and absolute percent change in concentration or kriging variance when compared with the initial estimation using all sample data. To interpret these values, a negative percent change indicates an underestimation of the hazard, while no percent change demonstrates hazard estimates similar to those using all sample data. Since we combined the percent change across 7-hazard mapping events, the absolute value was reported; so mapping event differences would not average one another out (i.e. over- and under-estimated concentrations summing to small net errors). This was particularly important to examine random-removal effects, which should result in both positive and negative percent changes. When evaluating the prediction performance of randomly removed sample sites, we assessed this separately for the 10 random-removal sample sets and reported both the mean and interquartile range (IQR) of the 10 outcomes to reflect variability. An additional evaluation was made of the facility subset representing the upper 10th percentile of occupational hazards; those of greatest concern for worker safety. For each mapping event, we identified the grid cells (n = 3148) representing the highest 10th percentile of predicted concentrations. The percent change in kriged concentrations and kriging variances were then extracted for just these regions. All models were performed using the R Statistical Software (v. 3.0.1) with the ‘sp’ and ‘gstat’ packages (Pebesma, 2004) for spatial analysis. Results Hazard mapping The results of mapping PNC and RMC during 7-hazard mapping events are summarized in Table 1. Four occurred during a cold season (i.e. January; n = 323 measurements) and three occurred during a warm season (i.e. May; n = 246 measurements). The overall mean of cold season PNC (133.4 × 1000 per cm3) was similar to that observed in the warm season (127.4 × 1000 per cm3). For RMC, the overall cold season mean (0.10 mg/m3) was slightly lower than observed in the warm season (0.15 mg/m3). Maps of kriged PNC and RMC concentrations for each of the mapping events are shown in Fig. 1. We observe a heterogeneous distribution of PNC with high concentrations in the central portion of the facility. For RMC, we observe less variability during the January compared with May sample periods, which is likely due to ventilation differences from heating and cooling systems. Although measured data were right-skewed, no transformation was performed (for additional details, see Supplementary data, available at Annals of Occupational Hygiene online). Table 1. Summary of particle number concentrations and respirable mass concentrations observed in a heavy-vehicle manufacturing facility in January and May. January samples May samples Total samples Sample characteristics Mapping events 4 3 7 Total samples 323 246 569 Particle number concentrations (×1000 per cm3) Mean (SD) 133.4 (70.0) 127.4 (92.8) 130.8 (80.7) Median (Min, Max) 123.0 (17.8, 505.7) 105.0 (19.9, 526.8) 112.6 (17.8, 526.8) Respirable mass concentration (mg/m3) Mean (SD) 0.10 (0.06) 0.15 (0.10) 0.12 (0.09) Median (Min, Max) 0.08 (0.02, 0.43) 0.12 (0.04, 0.89) 0.10 (0.02, 0.89) Correlation coefficienta Mean 0.21 0.39 0.29 January samples May samples Total samples Sample characteristics Mapping events 4 3 7 Total samples 323 246 569 Particle number concentrations (×1000 per cm3) Mean (SD) 133.4 (70.0) 127.4 (92.8) 130.8 (80.7) Median (Min, Max) 123.0 (17.8, 505.7) 105.0 (19.9, 526.8) 112.6 (17.8, 526.8) Respirable mass concentration (mg/m3) Mean (SD) 0.10 (0.06) 0.15 (0.10) 0.12 (0.09) Median (Min, Max) 0.08 (0.02, 0.43) 0.12 (0.04, 0.89) 0.10 (0.02, 0.89) Correlation coefficienta Mean 0.21 0.39 0.29 aPearson correlation coefficients between paired particle number concentrations and respirable mass concentrations averaged across sample days. View Large Table 1. Summary of particle number concentrations and respirable mass concentrations observed in a heavy-vehicle manufacturing facility in January and May. January samples May samples Total samples Sample characteristics Mapping events 4 3 7 Total samples 323 246 569 Particle number concentrations (×1000 per cm3) Mean (SD) 133.4 (70.0) 127.4 (92.8) 130.8 (80.7) Median (Min, Max) 123.0 (17.8, 505.7) 105.0 (19.9, 526.8) 112.6 (17.8, 526.8) Respirable mass concentration (mg/m3) Mean (SD) 0.10 (0.06) 0.15 (0.10) 0.12 (0.09) Median (Min, Max) 0.08 (0.02, 0.43) 0.12 (0.04, 0.89) 0.10 (0.02, 0.89) Correlation coefficienta Mean 0.21 0.39 0.29 January samples May samples Total samples Sample characteristics Mapping events 4 3 7 Total samples 323 246 569 Particle number concentrations (×1000 per cm3) Mean (SD) 133.4 (70.0) 127.4 (92.8) 130.8 (80.7) Median (Min, Max) 123.0 (17.8, 505.7) 105.0 (19.9, 526.8) 112.6 (17.8, 526.8) Respirable mass concentration (mg/m3) Mean (SD) 0.10 (0.06) 0.15 (0.10) 0.12 (0.09) Median (Min, Max) 0.08 (0.02, 0.43) 0.12 (0.04, 0.89) 0.10 (0.02, 0.89) Correlation coefficienta Mean 0.21 0.39 0.29 aPearson correlation coefficients between paired particle number concentrations and respirable mass concentrations averaged across sample days. View Large Figure 1. View largeDownload slide Mapped values of the (A) particle number concentrations (×1000 per cm3) and (B) respirable mass concentrations (mg/m3) for each of the seven mapping events. Green crosses represent sample locations (n = 82). Figure 1. View largeDownload slide Mapped values of the (A) particle number concentrations (×1000 per cm3) and (B) respirable mass concentrations (mg/m3) for each of the seven mapping events. Green crosses represent sample locations (n = 82). The standard deviation of PNC measurements was estimated across seven-hazard mapping events to reflect temporal variability. When averaged at sample locations, the PNC mean SD was 60.9 × 1000 per cm3 with a range of 12.1–154.0. For RMC, the mean SD at sample locations was 0.06 mg/m3 with a range of 0.02–0.32. Poor correlation between PNC and RMC was observed (ρ = 0.30), consistent with aerosol generation from different sources. Predicted SD for PNC and RMC were estimated using kriging for the full facility floor. We observed the highest SD in a center band with low predicted SD distributed throughout the facility (Fig. 2). After normalizing PNC and RMC SD estimates, we found more variability in PNC compared with RMC (Fligner-Killeen P < 0.001). This indicates greater homogeneity of RMC measurements. When the kriged SD was averaged, we found PNC estimates to be smaller in January (mean SD = 47.4) compared with May sampling days (mean SD = 51.8; Student’s t-test P < 0.001) with similar results for RMC (January SD = 0.04, May SD = 0.05; Student’s t-test P < 0.001). Figure 2. View largeDownload slide Kriged temporal variability (e.g. standard deviation across seven mapping events) of (A) particle number concentrations (×1000 per cm3) and (B) respirable mass concentrations (mg/m3). Green crosses represent sample locations (n = 82). Figure 2. View largeDownload slide Kriged temporal variability (e.g. standard deviation across seven mapping events) of (A) particle number concentrations (×1000 per cm3) and (B) respirable mass concentrations (mg/m3). Green crosses represent sample locations (n = 82). Prediction performance following sample location removal Figure 3 represents the mean RMSE of seven-hazard mapping events as locations were removed under optimal- and random-removal orders. The envelopes represent the IQR of 7 RMSE values for optimal-removal and 70 RMSE values for random-removal (10 random sets per hazard mapping event). A within mapping event paired Student’s t-test showed significantly lower RMSE estimates when locations were removed in an optimal compared with random-removal order (P < 0.001). While IQR envelopes overlap in Fig. 3, these represent estimates across all mapping events highlighting the variability in daily predictions. If RMSE iterations were averaged, the mean RMSE was 49.8 under optimal-removal and 56.2 during random-removal. Figure 3. View largeDownload slide The average root mean square error (RMSE) for kriged particle number concentrations of all seven mapping events when locations are eliminated under an (A) optimal-removal order and (B) random-removal order. Gray envelopes represent the interquartile range. Figure 3. View largeDownload slide The average root mean square error (RMSE) for kriged particle number concentrations of all seven mapping events when locations are eliminated under an (A) optimal-removal order and (B) random-removal order. Gray envelopes represent the interquartile range. As locations were removed, the percent change in kriged PNC compared with all predictions with all sample locations were averaged across mapping events and plotted against the number of locations in Fig. 4. This was presented for an optimal-, least-optimal, and random-removal order. Across the full facility, there was no statistical difference in the absolute change of kriged predictions between optimal- and random-removal orders (Student’s t-test P > 0.30), although both predicted significantly better than a least-optimal-removal order (Fig. 4A). However, when focused on the upper 10th percentile of PNC concentrations, we found significantly better prediction with optimal-removal compared with random-removal (Fig. 4B). Under optimal-removal, eliminating 15, 30, and 50 sensor locations results in a 1.8%, 1.3%, and 2.5% increase in PNC hazards compared with estimates using all data. This small prediction change implies that optimal-removal estimates hazards similarly to using all sample data. However, under random-removal of 15, 30, and 50 locations a 2.7, 6.5, and 18.4% decrease in predictions was observed, indicating underestimation of PNC. A larger decrease was found with a least-optimal approach. When examining the change in kriging variance from sensor removal, we found variance for PNC and RMC SDs to increase with fewer sample locations. However, no substantial differences between optimal-, random-, or least-optimal-removal orders were observed (results not shown). This indicates the importance of sensor numbers, rather than their location, for evaluating prediction precision. Figure 4. View largeDownload slide The mean change in predicted particle number concentrations following the removal of sample locations for the (A) full facility and (B) upper 10th percentile of exposure locations averaged across the seven-hazard mapping events. Circles denote estimates following optimal-removal of sample locations; squares denote estimates following a least-optimal-removal; triangles denote the median of 10 iterations of a random-removal with gray envelopes denoting the interquartile range of the iterations. Figure 4. View largeDownload slide The mean change in predicted particle number concentrations following the removal of sample locations for the (A) full facility and (B) upper 10th percentile of exposure locations averaged across the seven-hazard mapping events. Circles denote estimates following optimal-removal of sample locations; squares denote estimates following a least-optimal-removal; triangles denote the median of 10 iterations of a random-removal with gray envelopes denoting the interquartile range of the iterations. Since multiple hazards exist in occupational settings, we additionally evaluated sample optimization for RMC hazard mapping. We applied optimal and least-optimal-removal orders based on RMC data for the full facility floor (Fig. 5A) and the facility floor representing the upper 10th percentile for RMC concentrations (Fig. 5B), along with estimates from a random-removal order. Across both the full facility and the upper 10th percentile of locations, we found the absolute and actual change in RMC predictions to be similar under optimal- or random-removal orders, although both performed better than a least-optimal-removal order. These results indicate that for a spatially homogenous pollutant like RMC, there was little prediction performance benefit from optimizing removal orders. Figure 5. View largeDownload slide The mean change in predicted respirable mass concentrations following the removal of sample locations for the (A) full facility and (B) upper 10th percentile of exposure locations averaged across the seven-hazard mapping events. Circles denote estimates following the optimal-removal of sample locations; squares denote estimates following a least-optimal-removal; triangles denote the median of 10 iterations of a random-removal with gray envelopes denoting the interquartile range of the random iterations. Figure 5. View largeDownload slide The mean change in predicted respirable mass concentrations following the removal of sample locations for the (A) full facility and (B) upper 10th percentile of exposure locations averaged across the seven-hazard mapping events. Circles denote estimates following the optimal-removal of sample locations; squares denote estimates following a least-optimal-removal; triangles denote the median of 10 iterations of a random-removal with gray envelopes denoting the interquartile range of the random iterations. For comparison purposes with Figs 4 and 5, the actual change in kriged predictions across the full facility and the absolute percent change in predictions at the upper 10th percentile of locations for both PNC and RMC, respectively, are presented in Figs S2 and S3 in Supplementary data, available at Annals of Occupational Hygiene online. While these figures do not provide new conclusions, they present the direction of prediction bias when removing monitor locations. Discussion We used a statistical methodology to inform the removal order of sensor locations in a manufacturing facility based on preliminary hazard mapping data. Our focus was to preserve locations with high temporal variability, while producing accurate hazard maps. We found that, compared with random-removal, an optimal-removal order produces more accurate predictions of PNC at the upper 10th percentile of high-end exposures. However, there was no benefit in terms of prediction performance when evaluating the RMC measurements. If the goal of an occupational sampling campaign is robust hazard maps, statistical algorithms can successfully inform sensor networks for some hazards, but provide less value for homogenously distributed pollutants where spatial dependence can be captured by few samples regardless of their location. Several methods have been proposed to optimize sampling campaigns. Simulated annealing is a computationally intense approach using iterative algorithms to generate sampling possibilities (Brus and Heuvelink, 2007). Other methods incorporate land-use characteristics (Kumar, 2009) or spatial autocorrelation (Kanaroglou et al., 2005; Su et al., 2007) to minimize sample size, while maximizing pollutant variance. However these approaches differ from ours in several ways. First they focus on adding and not removing sample locations. Second, none are applied in occupational settings that differ from ambient environments. Outdoor air pollutants are influenced by factors such as traffic, anthropogenic activities, and weather, whereas occupational settings are concerned with emissions rates, dispersion, and contaminant decay. Third, their underlying approach is based on random selections of sample locations. An optimized random sample provides less utility for occupational settings where accessibility to sample locations often limits a sampling strategy. To avoid exposure mischaracterization and adequately describe true concentrations, a preliminary mapping event should represent hazards across space and time (Koehler and Volckens, 2011). Previous research found temporality to be more influential than spatial variation in some prediction settings. Lake et al. (2015) identified spatial differences to play a smaller role compared with temporal variability in maps of noise exposure. Liu and Hammond (2010) determined that mapping of automotive particulates should target regions of high temporal variation for follow-up or replicate sampling. By using high-quality mobile sensors, we could produce a robust temporal and spatial data set of random and targeted locations. Although high-quality/high-cost sensors are infeasible for long-term sampling, a preliminary sample creates snapshots of concentrations across space and time. We can then utilize our method to identify long-term sample locations that capture temporal variability (e.g. high SD regions), while reducing spatial coverage to a reasonable number of locations. When comparing mapped hazards, we found RMC to show significantly less temporal variability (e.g. SD maps) compared with PNC. This reflects knowledge that ultrafine particles, which substantially contribute to PNC, rapidly decay from their sources. Alternatively, RMCs are more homogeneous, creating uniform and similar distributions (Evans et al., 2008; Liu and Hammond, 2010). With poor correlation between PNC and RMC (ρ = 0.30), it is possible that different pollutant sources exist within our facility. Heitbrink et al. (2007) found fine particles in an engine machining plant were likely caused by gas burners and fugitive emissions, while coarse particles originated from machinery and metal working fluid. Alternatively, PM10 and PM1.0 showed high correlation in an automotive plant, indicating a common source (Dasch et al., 2005). We found that unlike PNC, the prediction of RMC did not perform substantially better for optimal-removal compared with random-removal, indicating less importance for specific sample locations. Therefore, in a multi-hazard occupational setting, it would be recommended that mapping events be optimized on the hazard displaying greatest variability. Our approach provides several important benefits for hazard mapping and sensor network designs. First by optimally removing sample locations, we may improve exposure assessment of potential worker exposures, notably at the high concentration range of occupational hazards, while reducing costs. Optimally removing locations slightly overestimated concentrations, as opposed to underestimating concentrations with random-removal. Second, our method can be broadly applied to any hazard dataset showing spatially correlated data. Although our example focused on particulate matter, it can be used for sampling campaigns of other important occupational hazards with a direct-reading sensor, such as noise or gases. The rubric also provides flexibility by allowing locations with high priority to be maintained without impacting the selection process. This feature is important when critical locations are targeted a priori, including identifying areas that may be of high- regulatory concern. Third, our method represents a cost and time-saving tool to inform long-term sensor sites without the trial and error of sampling. By observing the change in hazard maps after removing sensors, researchers can balance financial expenditures on sampling networks against benefits of improved prediction. There are several limitations to our study. Although kriging produces statistically optimal predictions, it introduces sources of uncertainty. Koehler and Peters (2012) found that the accuracy of kriged hazard maps was highly dependent upon the spatial correlation structure. Variograms were re-estimated as each sample location was removed and the correlation parameters differed substantially. This variation has been shown to impact the optimization of sampling schemes (Van Groenigen, 2000), increasing uncertainty as fewer sample locations represent less captured distances. A second concern is that immutable obstacles (e.g. power access, walls, and equipment) in industrial settings impede true random sampling. A structured sample grid, while efficient, may not provide the best spatial estimates (Koehler and Volckens, 2011) because variograms rely on close distance measurements to minimize kriging variances (Van Groenigen, 2000). As a recommendation, mapping campaigns and sensor networks would benefit from additional locations to capture small distance hazard concentrations. A third concern is that kriged variances represent spatial uncertainty, but not other sources of error uncertainty. Measurement and human-based error may cause interference bias and lack of precision or accuracy (Koehler and Volckens, 2011). Our measurements were also the average of 1-min samples, so the full range of exposure variability at each site is likely underrepresented. Fortunately, the impact of sampling error on hazard prediction was found to be minimal compared with unrepresentative sampling/mapping strategies (Koehler and Volckens, 2011). A next research step would investigate how spatiotemporal methods can be used to address temporally correlated samples. Our approach applied a 2-fold process that first calculated temporal variability followed by a spatial assessment, but this can be done with a single space-time covariance structure (Bivand et al., 2008). Spatiotemporal analysis have been used for predictions of environmental characteristics, such as temperature (Heuvelink et al., 2010) and radiation (Heuvelink et al., 2012), but not occupational exposures. It is speculated that assumptions of separable space-time dependence structures might not be consistent with industrial hazards that disperse by convection and diffusion (Koehler and Volckens, 2011), but this has not been tested. Limited research has investigated optimized sampling designs for spatiotemporal kriging (Heuvelink et al., 2012) and assessing it in an occupational setting would fill an important research gap. Conclusions We describe a novel method to inform the reduction of a sensor network that will preserve performance of predicted concentrations. Using a preliminary sample, removing sample locations in an optimal fashion produced more accurate predictions of PNC compared with a random-removal and least-optimal-removal orders. This trend was particularly apparent in the regions of highest occupational exposures. For RMC hazards, we found no difference in prediction performance between optimal- compared with random-removal orders. The results demonstrate that for temporally varying hazards a statistical algorithm allows us to effectively reduce the number of sample locations needed to produce high-quality hazard maps. This ability is useful when researchers have collected a robust, but costly, pilot sampling campaign and hope to follow with a reduced number of optimally placed long-term samples for exposure assessment purposes. While we tested our approach on particulates, the method is modifiable and can be applied to different hazards and occupational settings. Supplementary Data Supplementary data are available at Annals of Work Exposures and Health online. Declaration Funding was provided by the National Institute of Occupational Safety and Health (NIOSH) of the Centers for Disease Control and Prevention under award R01OH010533. Its contents, including any opinions and/or conclusions expressed, are solely those of the authors. Conflict of Interest No conflicting interests are declared. Acknowledgments The authors thank Sam Jones, Levi Mines, Paul Puglisi, and Ben Peters for their efforts with sampling and Songhe Hu for preparing the data. References Beelen R, Hoek G, Pebesma Eet al. ( 2009) Mapping of background air pollution at a fine spatial scale across the European Union. Sci Total Environ ; 407: 1852– 67. Google Scholar CrossRef Search ADS PubMed Bivand RS, Pebesma EJ, Gómez-Rubio V. ( 2008) Applied spatial data analysis with R . New York, NY: Springer. Bottenheim JW, Sirois A, Brice KAet al. ( 1994) Five years of continuous observations of PAN and ozone at a rural location in eastern Canada. J Geophys Res Atmospheres ; 99: 5333– 52. Google Scholar CrossRef Search ADS Brus DJ, Heuvelink GBM. ( 2007) Optimization of sample patterns for universal kriging of environmental variables. Geoderma ; 138: 86– 95. Google Scholar CrossRef Search ADS Chen CC, Chuang CL, Wu KYet al. ( 2009) Sampling strategies for occupational exposure assessment under generalized linear model. Ann Occup Hyg ; 53: 509– 21. Google Scholar PubMed Clerc F, Vincent R. ( 2014) Assessment of occupational exposure to chemicals by air sampling for comparison with limit values: the influence of sampling strategy. Ann Occup Hyg ; 58: 437– 49. Google Scholar PubMed Cressie NAC. ( 1993) Statistics for spatial data . Hoboken, NJ: John Wiley & Sons, Inc. Dasch J, D’Arcy J, Gundrum Aet al. ( 2005) Characterization of fine particles from machining in automotive plants. J Occup Environ Hyg ; 2: 609– 25. Google Scholar CrossRef Search ADS PubMed Diggle PJ. ( 1998) Model‐based geostatistics. J R Stat Soc Ser C Appl Stat ; 47: 299. Google Scholar CrossRef Search ADS Evans DE, Heitbrink WA, Slavin TJet al. ( 2008) Ultrafine and respirable particles in an automotive grey iron foundry. Ann Occup Hyg ; 52: 9– 21. Google Scholar CrossRef Search ADS PubMed Heitbrink WA, Evans DE, Peters TMet al. ( 2007) Characterization and mapping of very fine particles in an engine machining and assembly facility. J Occup Environ Hyg ; 4: 341– 51. Google Scholar CrossRef Search ADS PubMed Heuvelink GBM, Griffith DA, Hengl Tet al. ( 2012) Sampling design optimization for space-time kriging. ResearchGate ; In Mateu, J, Müller, WG, editors. Spatio-temporal Design: Advances in Efficient Data Acquisition. Chicester, UK: Wiley, pp. 207–30. ISBN 9780470974292. Heuvelink GBM, Jiang Z, Bruin SDet al. ( 2010) Optimization of mobile radioactivity monitoring networks. Int J Geogr Inf Sci ; 24: 365– 82. Google Scholar CrossRef Search ADS Kanaroglou PS, Jerrett M, Morrison Jet al. ( 2005) Establishing an air pollution monitoring network for intra-urban population exposure assessment: a location-allocation approach. Atmos Environ ; 39: 2399– 409. Google Scholar CrossRef Search ADS Koehler KA, Peters TM. ( 2012) Influence of analysis methods on interpretation of hazard maps. Ann Occup Hyg ; 57: 558– 70. Google Scholar PubMed Koehler KA, Volckens J. ( 2011) Prospects and pitfalls of occupational hazard mapping: ‘between these lines there be dragons.’ Ann Occup Hyg , 55: 829–840. doi:10.1093/annhyg/mer063 Koehler KA, Zhu J, Wang Het al. ( 2017) Sampling strategies for accurate hazard mapping of noise and other hazards using short-duration measurements. Ann Work Expo Health ; 61: 183– 94. Google Scholar CrossRef Search ADS PubMed Kumar N. ( 2009) An optimal spatial sampling design for intra-urban population exposure assessment. Atmos Environ ; 43: 1153. Google Scholar CrossRef Search ADS Lake K, Zhu J, Wang Het al. ( 2015) Effects of data sparsity and spatiotemporal variability on hazard maps of workplace noise. J Occup Environ Hyg ; 12: 256– 65. Google Scholar CrossRef Search ADS PubMed Liu S, Hammond SK. ( 2010) Mapping particulate matter at the body weld department in an automobile assembly plant. J Occup Environ Hyg ; 7: 593– 604. Google Scholar CrossRef Search ADS PubMed O’Brien DM. ( 2003) Aerosol mapping of a facility with multiple cases of hypersensitivity pneumonitis: demonstration of mist reduction and a possible dose/response relationship. Appl Occup Environ Hyg ; 18: 947– 52. Google Scholar CrossRef Search ADS PubMed Park JY, Ramachandran G, Raynor PCet al. ( 2010) Determination of particle concentration rankings by spatial mapping of particle surface area, number, and mass concentrations in a restaurant and a die casting plant. J Occup Environ Hyg ; 7: 466– 76. Google Scholar CrossRef Search ADS PubMed Pebesma EJ. ( 2004) Multivariable geostatistics in S: the gstat package. Comput Geosci ; 30: 683. Google Scholar CrossRef Search ADS Peters TM, Anthony TR, Taylor Cet al. ( 2012) Distribution of particle and gas concentrations in Swine gestation confined animal feeding operations. Ann Occup Hyg ; 56: 1080– 90. Google Scholar PubMed Peters TM, Heitbrink WA, Evans DEet al. ( 2006) The mapping of fine and ultrafine particle concentrations in an engine machining and assembly facility. Ann Occup Hyg ; 50: 249– 57. Google Scholar PubMed Su JG, Larson T, Baribeau AMet al. ( 2007) Spatial modeling for air pollution monitoring network design: example of residential woodsmoke. J Air Waste Manag Assoc ; 57: 893– 900. Google Scholar CrossRef Search ADS PubMed Van Groenigen JW. ( 2000) The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma ; 97: 223– 36. Google Scholar CrossRef Search ADS Waller LA, Gotway CA. ( 2004) Applied spatial statistics for public health data . Hoboken, NJ: John Wiley & Sons, Inc. Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the British Occupational Hygiene Society. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Annals of Work Exposures and Health (formerly Annals Of Occupational Hygiene) – Oxford University Press
Published: Mar 17, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera