Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An evaluation of a GARP model as an approach to predicting the spatial distribution of non‐vagile invertebrate species

An evaluation of a GARP model as an approach to predicting the spatial distribution of non‐vagile... INTRODUCTION Understanding and delineating species distributions are an important tool used to address numerous issues in ecology, biogeography and evolution ( Guisan & Thuiller, 2005 ) and are fundamental to the conservation of biodiversity ( Samways, 2005 ). Often, because of financial and political pressures, areas set aside for protection are small and located away from urban centres ( Balmford ., 2002 ); therefore, it is crucial to rigorously identify areas most worthy of protection when resources are limited. Sites afforded protection are often chosen because they harbour a high level of species richness or contain endangered species ( Myers, 1990 ; Scott ., 1993 ). The spatial distributional pattern of many taxa may be highly fragmented with populations representing evolutionary significant units ( Ryder, 1986 ; Avise & Nelson, 1989 ); thus, the full distributional ranges of the species deserve conservation consideration to maximize the retention of genetic and adaptive diversity. This is particularly important when designing reserves to protect low vagility organisms, such as small invertebrates, which can be highly susceptible to localized extirpations due to environmental perturbation ( Murphy ., 1990 ). Moreover, selection of a network of habitat fragments for conservation must be prudently chosen, as some configurations are more prone to extinction than others ( Gilpin, 1987 ; Neuhauser, 1998 ; Hanski, 2001 ). While knowledge of species distributional ranges is clearly necessary if effective measures are to be taken to preserve biodiversity ( Brooks ., 2004 ), the range limits of many species remain largely unknown or are incomplete. Collections records may be biased, resulting in an inaccurate assessment of the species ranges ( Nelson ., 1990 ; Graham ., 2004 ). There are several reasons why the full range of a species may be unclear. Foremost, many regions remain unsampled for a given taxonomic group. The time and expense necessary to conduct an exhaustive sampling over a large area often preclude such endeavours. Even in areas where a concerted effort has been made to sample the regional fauna and flora, progress has often been hampered by poor access (i.e. private property, no trails or roads, etc.). Additionally, many species are small, cryptic, or otherwise easy to overlook and require special sampling techniques to detect. Finally, many taxa are known from only a few specimens encountered incidentally by non‐scientists, and as such, tend to come from highly populated areas. In the absence of complete distributional records and sampling, ecological niche modelling using Geographical Information Systems (GIS) data provides an approach that can potentially overcome some of the hurdles discussed above ( Raxworthy ., 2003 ). Numerous studies have employed a GIS spatial analysis approach, the Genetic Algorithm for Rule‐set Prediction (GARP — Stockwell & Peters, 1999 ), to locate suitable niches of an assortment of vertebrate taxa, including birds ( Peterson ., 2002a ), rodents ( Anderson ., 2002 ), and fish ( Wiley ., 2003 ). GARP has even uncovered new species when presumed sister taxa were used to generate the model ( Raxworthy ., 2003 ). However, we see three major problems with the GARP approach. First, it is what is often characterized as a ‘black box’ technique. For example, there is no way to analyse the respective contributions of individual predictor variables to the model. Second, on only one occasion have models produced by GARP been evaluated by ground truthing ( Raxworthy ., 2003 ), and third, non‐vagile organisms potentially present a special problem with regard to modelling their distribution using GARP (or potentially any other technique), in particular, problems associated with the resolution (i.e. GIS spatial scale and layer resolution) of the modelled area and selection of appropriate environmental layers. In this study, we used GARP to create a model to predict the spatial distribution of a genus of trapdoor spiders using a limited set of museum collections records. First, we want to know whether a model produced by GARP can be used to accurately assess the spatial distribution of a taxa based on limited, and perhaps biased, collections records. As a direct result of our exhaustive sampling efforts in 2005, we now consider the distribution of Promyrmekiaphila to be well defined; prior to our investigation, the full extent of this spider's range was unknown. The ability to characterize an organism's spatial distribution using only museum data can save vast amounts of time, energy, and money necessary to engage in formal collecting expeditions ( Graham ., 2004 ), facilitate the delineation of areas for preservation ( Ferrier ., 2002 ), and guide collecting efforts for population, phylogeographical, and systematic studies. We also provide a limited comparison of the performance of the GARP model with two ‘simple’ modelling approaches, a climatic envelope model and a generalized linear model (GLM). Finally, we present a GARP model based solely on recently, formally collected data for comparison with the original GARP model. To our knowledge, there is a single published study evaluating the efficacy of GARP models to predict the spatial distribution of a taxon, using actual ground truthing (i.e. going into the field and rigorously testing the accuracy and precision of the model) ( Raxworthy ., 2003 ). The accuracy of GARP models has heretofore been analysed generally for species whose distributions are already known (e.g. Anderson ., 2002, 2003 , e.g., Peterson ., 2002a ; Loiselle ., 2003 ). Furthermore, this paper will provide a limited comparative analysis of GARP with other predictive methods. GARP models have been assumed to perform better than GLM or BIOCLIM models because GARP includes logistic regression and range envelopes as two of the four methods employed by default in the algorithm ( Anderson ., 2002 ; Stockwell , 1999 ; Peterson ., 2002b ); however, few comparative analyses have been published to support this postulate. Stockwell and Peterson (2002 ) claimed that GARP produced more accurate models than logistic regression, but other studies have shown no significant differences in model accuracy between GARP, BIOCLIM, and GLM models ( Elith, 2000 ; Loiselle ., 2003 ). Finally, this will mark the first neutral assessment of GARP in predicting the occurrence of not only a non‐vagile organism, but also a non‐invasive invertebrate. Often overlooked in conservation considerations, invertebrates represent the vast majority of the biological diversity on the planet. Indeed, arthropods comprise over 80% of all extant animal taxa ( Ruppert ., 2004 ), yet, sadly, remain largely ignored in conservation efforts ( Wilson, 1987 ). METHODS The trapdoor spider genus Promyrmekiaphila Schenkel, 1950, is the focal taxon of this study (shown in Fig. 1 ). Three nominal species currently compose the genus, and all are endemic to the California Floristic Province, a Biodiversity Hotspot ( Cincotta ., 2000 ). Members of this genus are small‐ to medium‐sized trapdoor spiders belonging to the spider infraorder Mygalomorphae and are known to be found in mesic areas of northern and central California ( Bond & Opell, 2002 ). They construct and inhabit silk‐lined burrows, typically located on steep banks, which are covered by a hinged silken‐soil trapdoor thought to have a protective function and known to assist in prey capture. They are fossorial, rarely travelling far from their burrow (as evidenced by multigenerational burrow aggregations), except when males wander short distances in search of females during mating season. As with most mygalomorph spider species they have limited means of dispersal ( Maine, 1982 ; Coyle, 1983 ), and are slow to reach sexual maturity — all characteristics which render them highly susceptible to both environmental perturbation and urban encroachment. 1 Map of California showing the study area and a live Promyrmekiaphila specimen. We employed a geospatial analysis that uses an artificial intelligence method, GARP (available online, http://www.lifemapper.org/desktopgarp ), in concert with the Geographic Information Systems software package, arc view GIS 9.0 ( ESRI, 2004 ). GARP infers correlations between environmental layers representing known species localities and a set of biotic and abiotic parameters. GARP is a genetic algorithm that produces predictive models for species distribution; however, because GARP is non‐deterministic, multiple optimal models are produced, and subsequent runs using the same data will produce slightly different results. GARP may be useful when only presence data are available because true absence data are not utilized as input. Pseudo‐absences are created in GARP by resampling from points within the study area where the species has not been designated as present; however, because these points are not true absence points, these points potentially contain the species ( Guisan & Zimmermann, 2000 ). The initial GARP model was based on specimens obtained on loan from museums and private collections. The least accurate specimen localities were removed to minimize the amount of heterogeneity in location accuracy, and, in order to control some of the bias inherent in natural collections records, some of the points were removed where multiple specimens were located within a relatively small area ( Guisan & Thuiller, 2005 ). The locality data associated with the 42 remaining museum specimens were georeferenced using United States Geological survey 1 : 25,000 topographic maps (Maps a la carte, Inc.). Seven map layers were used in the GARP analysis representing parameters that, based on our collective 15 years of field experience, we know to influence the survival of these spiders. These environmental coverages were all obtained from various sources online and include elevation, aspect, soil texture, minimum temperature in January, maximum temperature in July, historic vegetation, and average annual precipitation ( Table 1 ). All data layers were clipped in arcgis using a mask of our study area. The study area in northern California was defined roughly by Monterey County in the south, the California/Oregon border in the north, the Pacific Ocean to the west, and the Sierra Nevada Mountains to the east ( Fig. 1 ). Most environmental layers were obtained as rasters; vector data layers were converted to rasters in arcgis with the Feature to Raster Conversion Tool. An executable, resample _ clip _ to _ ascii (VanDerWal), was employed to project all layers to the Universal Transverse Mercator coordinate system North American Datum 1983 and equalize the cell sizes to one arc‐second resolution (approximately 30 m 2 ), as well as convert all layers to ASCII format — all prerequisites for GARP. We set GARP to perform 20 runs per specimen with a convergence limit of 0.01 and 1000 maximum iterations. All four rule types (atomic, range, negated range, and logistic regression) were employed as well as the best subset feature of GARP. Under these settings, GARP produced a total of 420 models in which all cells are predicted either present (1) or absent (0). We then used the Summation feature in the Raster Calculator of arcgis to make a final, cumulative predictive map. 1 Layer data used in GARP analyses GIS data layers Source Elevation National elevation data, United States Geological Survey, http://seamless.usgs.gov/ Aspect Derived from elevation data in arcgis Maximum temperature in July WorldClim, http://biogeo.berkeley.edu/worldclim/worldclim.htm Minimum temperature in January WorldClim, http://biogeo.berkeley.edu/worldclim/worldclim.htm Rainfall California Spatial Information Library, United States Geological Survey, http://gis.ca.gov/ Soil texture State Soil Geographic (STATSGO) database, Natural Resources Conservation Service, http://www.ncgc.nrcs.usda.gov/products/datasets/statsgo/ Two additional predictive models were created based on the same museum data. A BIOCLIM model was created using diva – gis version 5.1.0.1 ( Hijmans ., 2001 ). The BIOCLIM model is based on all 19 available climatic variables and a 30 arc‐second resolution. The GLM was created using binary logistic regression analysis in spss version 12.0. GLMs require presence and absence data as input, so random pseudo‐absence points were created in diva – gis . The categorical variables — soil texture and vegetation — could not easily be included in the analysis; elevation and minimum temperature in January, the two most significant of the remaining layers according to the backward stepwise method, were included in the model. The GARP predictive map was used as a guide for field collection efforts during a 1‐week trip to the study area in March 2005 and 5 weeks in May and June 2005. We divided the map values into four classes: predicted absent, and low, medium, and high probability of presence. We attempted to collect spiders randomly throughout the study area, as well as across two transects. We identified two roads that traversed all four probability levels — Stewart Point Skaggs Springs Road in Sonoma County and Orr Springs Road in Mendocino County. Both roads run in a general east–west direction. The GARP model shows the probability of species presence increases, in general, towards the coast. We sampled at regular intervals of six miles along both transects and gathered presence and absence data at each stop. The presence and absence locality data associated with specimens we collected in the field were subsequently used as independent test data to examine the performance of the three models produced above. One measure of accuracy was found using a comparison of each data point with its corresponding GARP model prediction value ( Anderson ., 2002 ). Further statistical assessments were derived from a confusion matrix and include χ 2 and Kappa statistics and commission and omission errors. Receiver operating characteristic (ROC) plots were obtained to provide a threshold‐independent measure of overall accuracy and to compare the predictive ability of the GARP, BIOCLIM, and GLM models. Upon return from our two collecting trips, we created a final GARP model using the same parameters and settings, but based solely on localities where we had collected from March–June 2005 to test whether data which have been formally collected, and hence are less susceptible to sampling bias, would greatly alter the predicted spatial distribution of the spiders. Furthermore, the locality data associated with our fieldwork, obtained through the use of a GPS unit, may be more accurate than some of the historical descriptions associated with the museum specimens. RESULTS The GARP model utilizing the 42 locality data points from museum specimens is shown in Fig. 2(a) . Areas where GARP predicts the highest probability of presence are predominantly in the immediate vicinity of the museum specimen records, but moderate to high levels of probability are also defined in areas where Promyrmekiaphila have not been previously collected, for example, in the northern part of the state near Eureka and the midsection of the Oregon/California border, the eastern slopes surrounding the Central Valley, and the Sutter Buttes — the remnants of a dormant volcano in the middle of the Central Valley. From March–June 2005, we collected Promyrmekiaphila at 80 newly discovered localities and determined that the spiders were absent at 49 different localities ( Fig. 2b ). Of those 129 points, the GARP model predicted 61 cells present with spiders and 68 absent. These values were incorporated into a confusion matrix ( Table 2 ), which was used to calculate various measures of accuracy and error ( Table 3 ). 2 Ecological niche distribution models for Promyrmekiaphila. Tan represents areas of predicted absence; green, light blue, and dark blue represent increasing probability levels of presence. (a) GARP map based on 42 locations from museum collections records; red dots represent museum specimens (scale bar = 35 km). (b–e) Red dots represent material collected from March–June 2005; crosses represent areas of absence: (b) GARP map based on museum localities; (c) GARP map based on recent material; (d) logistic regression map; and (e) BIOCLIM map. (f–g) Close‐up of the predictive GARP models showing collection results along two transects (red dots indicate presence, crosses represent absence; scale bar = 10 km): (f) map based on museum specimens; (g) map based on recently collected material. 2 Confusion matrix created from material collected from March–June 2005 Prediction Observed Present Absent Total Present 41 20 61 Absent 39 29 68 Total 80 49 129 3 Measurements of accuracy derived from the confusion matrix presented in Table 2 Measure Value Commission index 0.408 Omission error 0.488 Kappa 0.097 Results of the χ 2 test demonstrate that the discrepancy between observed and expected frequencies is too large to be attributed to chance alone (χ 2 = 11.23, d.f. = 1, P < 0.001). Because our expected data were taken from the GARP model, the χ 2 test suggests that the proportion of cells actually observed present and absent, respectively, do not match the proportions predicted from the GARP model. Results show moderately high degrees of both extrinsic commission error (false positives) and omission error (false negatives). Out of the 80 points positively identified as present, 39 were mistakenly predicted by the GARP model to be absent. Out of the 49 points identified as absent, 20 were falsely predicted as present. The χ 2 test evaluates GARP model significance but does not actually indicate the accuracy of the model. Although the GARP model was not a significant predictor of Promyrmekiaphila presence, in some instances non‐significant models may nonetheless show a high degree of accuracy ( Anderson ., 2002 ). This discrepancy is due to the difference between comparing proportions in a significance test and comparing each data point with its corresponding GARP prediction value in a test for accuracy ( Anderson ., 2002 ). For our data, only half of the collection points (41 out of 80) fell into areas of predicted presence. Predictivity of absence in the GARP model fared only slightly better, with 29 (out of 49) of the localities determined to be lacking Promyrmekiaphila falling into cells of predicted absence. Along the transects, the observed state of the localities often did not correspond to the predicted state; however, we did detect a general trend of decreasing spider abundance as we travelled eastward — a pattern exhibited in the GARP model. The Kappa statistic is often used as a measure of overall accuracy because it incorporates all of the information contained within the confusion matrix ( Fielding & Bell, 1997 ). Based on the assessment by Landis and Koch (1977 ) that K < 0.4 is poor, the Kappa value calculated from our collection data suggests that the GARP model performed very poorly (K = 0.097). According to the ROC plot ( Fig. 3 ), the area under the curve (AUC) for the GARP model was 0.556, suggesting that using the model is no better than simply guessing or flipping a coin. Alternatively, the BIOCLIM and GLM models ( Fig. 2d–e ) not only displayed greater AUCs than the GARP model, but they were also significant predictors of the spatial distribution of Promyrmekiaphila ( Table 4 ). 3 ROC plots of GARP, BIOCLIM, and GLM models based on museum records, evaluated using recently collected material. 4 Measurements associated with the ROC plots displayed in Fig. 3 Model AUC Std. error Asymptotic sig. BIOCLIM 0.699 0.046 0.000 GARP 0.556 0.053 0.287 LOG_REG 0.662 0.052 0.002 The GARP model based solely on localities where we had collected from March–June 2005 is shown in Fig. 2(c) , and displays an overall effect of range reduction — a further restriction of the area considered to be a suitable habitat, although the model does exhibit an increase in the probability of occurrence in many cells in the immediate vicinity of the collection sites along the transects, as would be expected ( Fig. 2f–g ). However, we suspect that the environmental conditions are similar to the north and south of the transects — areas where GARP showed only a moderate increase in the number of cells of predicted presence, even after the addition of the March–June data. The addition of the new data did convert the areas of moderate probability of species occurrence near the eastern end of the transects to areas of predicted absence. Although no true absence data were employed in the GARP models, we did not find Promyrmekiaphila in those locations. Locality data for Promyrmekiaphila collected from March–June 2005 can be accessed at http://www.mygalomorphae.org . DISCUSSION Compared to the other more computationally simplistic approaches (BIOCLIM and GLM) examined for our data, the GARP‐produced model failed miserably in its ability to accurately reconstruct the distribution of Promyrmekiaphila throughout northern and central California. Because the BIOCLIM and GLM models were produced using a slightly different set of predictive variables, conclusions that can be drawn from comparisons to the GARP model are obviously limited. Nevertheless, these simpler models are not as computationally intensive and are much easier to produce — qualities that would be attractive to systematists not specializing in spatial modelling. We attribute the failure of the GARP model to a number of issues related to the restricted spatial distribution of the study taxon and other issues related to the opaque nature of the GARP algorithm. Primarily, resolution, or spatial scale, and predictor variables are two components that must be prudently chosen when constructing any distributional model. Organisms, like Promyrmekiaphila, with low vagility and poor dispersal capabilities may further complicate these decisions due to their inherent isolation. Problems associated with resolution of spatial data in GIS analyses can be particularly pervasive when parcels of suitable habitat are disjunct. The area that a small, non‐vagile species inhabits may be only a few meters or less; thus, finer resolution usually provides better predictive ability in models ( Guisan & Thuiller, 2005 ). Thus, an increase in data resolution is one of the primary factors necessary to increase model prediction accuracy, particularly for areas with microtopographic variation ( Guisan & Zimmermann, 2000 ). Such a complex landscape defines our study area in northern California. Guisan and Thuiller (2005 ) argue that, for sessile organisms, not only must the combination of all suitable conditions be present within a cell, but they must all be present at the same specific location within the cell. This requirement probably extends to highly non‐vagile species as well. The ecological GIS data sets (e.g. layers) used for modelling are available at various resolutions, but even the highest resolution data sets can belie the heterogeneity of the landscape. Consequently, modelling programs tend to identify large tracts as suitable (or unsuitable) although the habitat is actually heterogeneous ( Cowley ., 1999 ). Aside from issues of scale, which can overlook the isolated nature of suitable habitat, isolation clearly presents other difficulties especially for non‐vagile species. This difficulty stems from the fact that the actual species range can potentially be overpredicted. Ecological niche models define areas that are ecologically suitable for a species, but many factors may be responsible for the absence of the species in such an area. Biotic interactions (e.g. predators or competitors) may have precluded a species from an otherwise suitable area. Historical geological factors may have hampered dispersal to certain areas and potentially represent a severe limiting factor in predictive models because they are not accounted for in the model ( Guisan & Zimmermann, 2000 ). Or perhaps, the species has gone extinct as the result of natural events or, more likely, of recent human activities. All of the aforementioned factors intersect to form the species realized niche ( Soberón & Peterson, 2005 ). That said, it is important to note that there is some debate about whether spatial distribution models define a species fundamental niche or realized niche. Guisan and Thuiller (2005 ) state that most of the literature assumes, without proper evidence, that spatial models represent the realized niche of the species, because their observed distributions are already constrained by biotic interactions and limiting resources. A limited degree of information regarding biotic interactions and accessibility (a function of historical geological factors and dispersal capabilities) may be introduced indirectly into GARP via the pseudo‐absences ( Soberón & Peterson, 2005 ). The realized niche may be significantly smaller than the fundamental niche if the degree and influence of biotic interactions are large. For Promyrmekiaphila , we suspect that biotic interactions do not play an overwhelming role in delimiting distribution. These spiders are generalist predators known to occur frequently alongside other burrowing spiders, and they themselves have few predators, primarily parasitic pompilid wasps, which, based on extensive field observations, do not decimate populations. Accessibility to suitable habitat, however, is expected to be a major factor that severely constrains the realized niche as a much smaller portion of the fundamental niche due to the fossorial nature and limited dispersal capabilities of these spiders. In addition, the genus may not have had time to reach a state of equilibrium with its environment ( Guisan & Thuiller, 2005 ). Animals with adequate dispersal capabilities are often able to reach isolated patches of suitable habitat, but highly non‐vagile species are more constrained, and even in relatively continuous areas of suitable habitat, they may not have been present long enough to have dispersed throughout the area. Although some work has been done with plants which incorporates dispersal capabilities into the model ( Iverson ., 1999 ; Dullinger ., 2004 ), we know of no GIS layers that could incorporate this information; therefore, some degree of overprediction is expected. Overprediction, also called commission error or false positives, is desirable to some extent. Whereas areas predicted as absent may be falsified by the discovery of the species at that location, areas designated as absent of spiders can never truly be proved as such, even after extensive searching. For this reason, the commission error is often referred to, more correctly, as a commission index. Extremely low commission index means the data have been overfitted — essentially, the model only predicts the species to occur in and around the original presence localities used as training points in the model. Areas of overprediction are, theoretically, places of suitable habitat. Although all of the areas of predicted presence will not be occupied, these fragments of predicted occurrence can potentially be used to test evolutionary hypotheses regarding vicariant speciation. For instance, some areas of intermediate habitat may have acted as corridors between areas of currently occupied habitat. Overprediction has also been used to predict the potential distributions of invasive species including several invertebrate species ( Ganeshaiah ., 2003 ; Soberon ., 2001 ; Roura‐Pascual ., 2005 ). However, one of the defining characteristics of any invasive species is the ability to disperse easily and widely ( Richardson ., 2000 ). For any organism, dispersal from one area of suitable habitat to another may be prevented by unfavourable environmental conditions in the intermediate area which act as ecological barriers. However, overprediction becomes a major factor with non‐vagile species because the degree of isolation becomes increasingly larger as the vagility of an organism decreases. This may result in too many false positives, in which case, the model is rendered a failure. Many of the areas determined to be overpredicted as a result of our recent collection efforts in the field did not match our expectations of suitable habitat. We made numerous stops in Siskiyou County, at the northern reaches of the study area, from which Promyrmekiaphila is not known, but predicted to occur by the GARP model. The absence of Promyrmekiaphila at those locations could be due to the inability of these spiders to disperse over great distances, or the presence of uninhabitable niches lying intermediate between the areas in Siskiyou and the northernmost area from which they are known (near Redding, Shasta County). However, because the areas in Siskiyou County where the GARP model predicted high probabilities of occurrence appeared to us to be too rocky and lacking in the amount of soil necessary to constitute suitable Promyrmekiaphila habitat, we suspect that these overpredicted areas are an artefact of model failure, due perhaps to the lack of necessary soil information contained in the environmental layers. Not including the factors critical to delimiting species distributions may cause GARP models to fail. The underlying premise of these models is that predictable relationships exist between the occurrence of a species and the environment. However, for many species, especially fossorial invertebrate taxa, very little is known with regards to their habitat requirements. This is the case for Promyrmekiaphila , a genus for which very little habitat data have been published. For our analyses we selected environmental parameters that we believe play an important role in delimiting the spatial distribution of these spiders. However, it is possible that a critical environmental factor was omitted. Guisan and Thuiller (2005 ) suggest that fine‐scale resolution, as exhibited in Promyrmekiaphila , is controlled by a patchy distribution of resources that are assimilated by the species, such as water. Because these organisms are sedentary burrowers, it is possible that specific soil features that were not captured in our soil data are critical. The addition of environmental layers such as water holding capacity of the soil may improve the model. Outside of this study, most of our collecting efforts are focused on areas that appear to offer suitable habitat. In contrast, during our fieldwork in 2005, we directed much of our collecting efforts to areas of predicted absence because the degree to which our collecting efforts in areas of predicted absence conform to the model provides us with a gauge of model accuracy ( Anderson ., 2003 ). Although the vast majority of the absence points fell into cells of predicted absence, the GARP models did exhibit a high degree of omission error — almost half of the total points identified as present upon observation had been misidentified by the GARP model as absent. Although the relative costs of commission and omission errors will vary according to the particular aim of the model, omission error is particularly egregious when using distribution models in the determination of areas for conservation consideration, because the omission of areas that actually contain populations may make the species susceptible to the loss of genetic diversity ( Fielding , 1997 ). Stockwell and Peterson (2002 ) suggest that too many classes within an environmental layer may result in high omission error. One way to minimize this could be to combine some of the classes within the categorical layers. Another potential problem with GARP modelling occurs when the phylogeny of the study organism is unresolved. Currently, three nominal sibling species compose the genus Promyrmekiaphila , but the correct taxonomic position of these species is unknown. If the species are closely related sister taxa, speciation arose via vicariance, and the niche is conserved, then the model should predict all species equally well. Conversely, if the niche is not conserved, then it may not perform well. In our study there are three areas that differ significantly from the redwood and mixed hardwood forests from which most of the specimens are known. This might also suggest areas of secondary contact where the predicted range between two species overlaps. In the future, generalized additive models, GAM, could be used to generate a model from presence‐only data and computer‐generated pseudo‐absences. This approach, although not as accurate as using true presence/absence data, has been shown to provide significantly predictive models ( Ferrier & Watson, 1997 ; Zaniewski ., 2002 ) with results that are easy to read and easy to interpret ( Guisan & Thuiller, 2005 ). This approach could provide a useful alternative to GARP when the spatial distribution of a species is desired, but exhaustive sampling is not feasible. CONCLUSION A GARP model created from historical museum collections records of a genus of trapdoor spiders proved to be an unreliable guide for collecting in the field and did not provide an adequate representation of their true present‐day spatial distribution. We suspect that the extremely sedentary nature of trapdoor spiders and absence of significant environmental layers are the primary culprits causing the inaccuracy of the GARP model produced in our study. These spiders do not require large tracts of suitable habitat, and the small patches where they are often located are too fine‐scale to be represented in the currently available environmental data sets. The GARP model exhibits a large degree of overprediction, somewhat expected and not necessarily a negative attribute of the models, as discussed earlier. However, despite the large degree of overprediction, the GARP model failed to predict spiders in several large‐scale areas where spiders were present. For organisms of low vagility, most currently available data sets are at a resolution too course to provide accurate predictions. GARP models may perform better at a courser resolution; however, this would provide only an ‘extent of occurrence’ map. These maps represent a gross geographical scale and, because the organism is unlikely to inhabit all areas within the extent of occurrence, primarily delineate the extreme boundaries of the distributional range. A more realistic depiction of the actual range are represented by ‘areas of occupancy’, a network of sometimes disjunct patches of habitat of various levels of suitability ( Gaston, 1994 ). GARP models have previously been used to characterize the distribution of more vagile taxa and are alleged to be useful in predicting future distributions of invasive species, as well as in modelling potential habitat translocations due to climate change. However, based on the results of our study, our comparisons to other modelling approaches, and the current unavailability of data at an appropriate resolution, we do not recommend that GARP be used as an approach to modelling species distributions of non‐vagile trapdoor spiders and suspect that this recommendation should be extended, at a minimum, to all non‐vagile invertebrate species. Moreover, we do not recommend GARP as a first approach to modelling species distributions for systematists because, as demonstrated, models generated by simpler methods with fewer variable types perform better. GARP, unlike BIOCLIM or GLM, is more difficult to implement, is computationally intensive, and, because it is a ‘black box’ approach, algorithm outcomes are virtually impossible to interpret and troubleshoot. ACKNOWLEDGEMENTS Brent Hendrixson and three anonymous reviewers made useful comments on an earlier draft of this paper. This work was supported by National Science Foundation Grant DEB 0315160. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Diversity and Distributions Wiley

An evaluation of a GARP model as an approach to predicting the spatial distribution of non‐vagile invertebrate species

Loading next page...
 
/lp/wiley/an-evaluation-of-a-garp-model-as-an-approach-to-predicting-the-spatial-tmeerrEG3h

References (51)

Publisher
Wiley
Copyright
Copyright © 2006 Wiley Subscription Services, Inc., A Wiley Company
ISSN
1366-9516
eISSN
1472-4642
DOI
10.1111/j.1366-9516.2006.00225.x
Publisher site
See Article on Publisher Site

Abstract

INTRODUCTION Understanding and delineating species distributions are an important tool used to address numerous issues in ecology, biogeography and evolution ( Guisan & Thuiller, 2005 ) and are fundamental to the conservation of biodiversity ( Samways, 2005 ). Often, because of financial and political pressures, areas set aside for protection are small and located away from urban centres ( Balmford ., 2002 ); therefore, it is crucial to rigorously identify areas most worthy of protection when resources are limited. Sites afforded protection are often chosen because they harbour a high level of species richness or contain endangered species ( Myers, 1990 ; Scott ., 1993 ). The spatial distributional pattern of many taxa may be highly fragmented with populations representing evolutionary significant units ( Ryder, 1986 ; Avise & Nelson, 1989 ); thus, the full distributional ranges of the species deserve conservation consideration to maximize the retention of genetic and adaptive diversity. This is particularly important when designing reserves to protect low vagility organisms, such as small invertebrates, which can be highly susceptible to localized extirpations due to environmental perturbation ( Murphy ., 1990 ). Moreover, selection of a network of habitat fragments for conservation must be prudently chosen, as some configurations are more prone to extinction than others ( Gilpin, 1987 ; Neuhauser, 1998 ; Hanski, 2001 ). While knowledge of species distributional ranges is clearly necessary if effective measures are to be taken to preserve biodiversity ( Brooks ., 2004 ), the range limits of many species remain largely unknown or are incomplete. Collections records may be biased, resulting in an inaccurate assessment of the species ranges ( Nelson ., 1990 ; Graham ., 2004 ). There are several reasons why the full range of a species may be unclear. Foremost, many regions remain unsampled for a given taxonomic group. The time and expense necessary to conduct an exhaustive sampling over a large area often preclude such endeavours. Even in areas where a concerted effort has been made to sample the regional fauna and flora, progress has often been hampered by poor access (i.e. private property, no trails or roads, etc.). Additionally, many species are small, cryptic, or otherwise easy to overlook and require special sampling techniques to detect. Finally, many taxa are known from only a few specimens encountered incidentally by non‐scientists, and as such, tend to come from highly populated areas. In the absence of complete distributional records and sampling, ecological niche modelling using Geographical Information Systems (GIS) data provides an approach that can potentially overcome some of the hurdles discussed above ( Raxworthy ., 2003 ). Numerous studies have employed a GIS spatial analysis approach, the Genetic Algorithm for Rule‐set Prediction (GARP — Stockwell & Peters, 1999 ), to locate suitable niches of an assortment of vertebrate taxa, including birds ( Peterson ., 2002a ), rodents ( Anderson ., 2002 ), and fish ( Wiley ., 2003 ). GARP has even uncovered new species when presumed sister taxa were used to generate the model ( Raxworthy ., 2003 ). However, we see three major problems with the GARP approach. First, it is what is often characterized as a ‘black box’ technique. For example, there is no way to analyse the respective contributions of individual predictor variables to the model. Second, on only one occasion have models produced by GARP been evaluated by ground truthing ( Raxworthy ., 2003 ), and third, non‐vagile organisms potentially present a special problem with regard to modelling their distribution using GARP (or potentially any other technique), in particular, problems associated with the resolution (i.e. GIS spatial scale and layer resolution) of the modelled area and selection of appropriate environmental layers. In this study, we used GARP to create a model to predict the spatial distribution of a genus of trapdoor spiders using a limited set of museum collections records. First, we want to know whether a model produced by GARP can be used to accurately assess the spatial distribution of a taxa based on limited, and perhaps biased, collections records. As a direct result of our exhaustive sampling efforts in 2005, we now consider the distribution of Promyrmekiaphila to be well defined; prior to our investigation, the full extent of this spider's range was unknown. The ability to characterize an organism's spatial distribution using only museum data can save vast amounts of time, energy, and money necessary to engage in formal collecting expeditions ( Graham ., 2004 ), facilitate the delineation of areas for preservation ( Ferrier ., 2002 ), and guide collecting efforts for population, phylogeographical, and systematic studies. We also provide a limited comparison of the performance of the GARP model with two ‘simple’ modelling approaches, a climatic envelope model and a generalized linear model (GLM). Finally, we present a GARP model based solely on recently, formally collected data for comparison with the original GARP model. To our knowledge, there is a single published study evaluating the efficacy of GARP models to predict the spatial distribution of a taxon, using actual ground truthing (i.e. going into the field and rigorously testing the accuracy and precision of the model) ( Raxworthy ., 2003 ). The accuracy of GARP models has heretofore been analysed generally for species whose distributions are already known (e.g. Anderson ., 2002, 2003 , e.g., Peterson ., 2002a ; Loiselle ., 2003 ). Furthermore, this paper will provide a limited comparative analysis of GARP with other predictive methods. GARP models have been assumed to perform better than GLM or BIOCLIM models because GARP includes logistic regression and range envelopes as two of the four methods employed by default in the algorithm ( Anderson ., 2002 ; Stockwell , 1999 ; Peterson ., 2002b ); however, few comparative analyses have been published to support this postulate. Stockwell and Peterson (2002 ) claimed that GARP produced more accurate models than logistic regression, but other studies have shown no significant differences in model accuracy between GARP, BIOCLIM, and GLM models ( Elith, 2000 ; Loiselle ., 2003 ). Finally, this will mark the first neutral assessment of GARP in predicting the occurrence of not only a non‐vagile organism, but also a non‐invasive invertebrate. Often overlooked in conservation considerations, invertebrates represent the vast majority of the biological diversity on the planet. Indeed, arthropods comprise over 80% of all extant animal taxa ( Ruppert ., 2004 ), yet, sadly, remain largely ignored in conservation efforts ( Wilson, 1987 ). METHODS The trapdoor spider genus Promyrmekiaphila Schenkel, 1950, is the focal taxon of this study (shown in Fig. 1 ). Three nominal species currently compose the genus, and all are endemic to the California Floristic Province, a Biodiversity Hotspot ( Cincotta ., 2000 ). Members of this genus are small‐ to medium‐sized trapdoor spiders belonging to the spider infraorder Mygalomorphae and are known to be found in mesic areas of northern and central California ( Bond & Opell, 2002 ). They construct and inhabit silk‐lined burrows, typically located on steep banks, which are covered by a hinged silken‐soil trapdoor thought to have a protective function and known to assist in prey capture. They are fossorial, rarely travelling far from their burrow (as evidenced by multigenerational burrow aggregations), except when males wander short distances in search of females during mating season. As with most mygalomorph spider species they have limited means of dispersal ( Maine, 1982 ; Coyle, 1983 ), and are slow to reach sexual maturity — all characteristics which render them highly susceptible to both environmental perturbation and urban encroachment. 1 Map of California showing the study area and a live Promyrmekiaphila specimen. We employed a geospatial analysis that uses an artificial intelligence method, GARP (available online, http://www.lifemapper.org/desktopgarp ), in concert with the Geographic Information Systems software package, arc view GIS 9.0 ( ESRI, 2004 ). GARP infers correlations between environmental layers representing known species localities and a set of biotic and abiotic parameters. GARP is a genetic algorithm that produces predictive models for species distribution; however, because GARP is non‐deterministic, multiple optimal models are produced, and subsequent runs using the same data will produce slightly different results. GARP may be useful when only presence data are available because true absence data are not utilized as input. Pseudo‐absences are created in GARP by resampling from points within the study area where the species has not been designated as present; however, because these points are not true absence points, these points potentially contain the species ( Guisan & Zimmermann, 2000 ). The initial GARP model was based on specimens obtained on loan from museums and private collections. The least accurate specimen localities were removed to minimize the amount of heterogeneity in location accuracy, and, in order to control some of the bias inherent in natural collections records, some of the points were removed where multiple specimens were located within a relatively small area ( Guisan & Thuiller, 2005 ). The locality data associated with the 42 remaining museum specimens were georeferenced using United States Geological survey 1 : 25,000 topographic maps (Maps a la carte, Inc.). Seven map layers were used in the GARP analysis representing parameters that, based on our collective 15 years of field experience, we know to influence the survival of these spiders. These environmental coverages were all obtained from various sources online and include elevation, aspect, soil texture, minimum temperature in January, maximum temperature in July, historic vegetation, and average annual precipitation ( Table 1 ). All data layers were clipped in arcgis using a mask of our study area. The study area in northern California was defined roughly by Monterey County in the south, the California/Oregon border in the north, the Pacific Ocean to the west, and the Sierra Nevada Mountains to the east ( Fig. 1 ). Most environmental layers were obtained as rasters; vector data layers were converted to rasters in arcgis with the Feature to Raster Conversion Tool. An executable, resample _ clip _ to _ ascii (VanDerWal), was employed to project all layers to the Universal Transverse Mercator coordinate system North American Datum 1983 and equalize the cell sizes to one arc‐second resolution (approximately 30 m 2 ), as well as convert all layers to ASCII format — all prerequisites for GARP. We set GARP to perform 20 runs per specimen with a convergence limit of 0.01 and 1000 maximum iterations. All four rule types (atomic, range, negated range, and logistic regression) were employed as well as the best subset feature of GARP. Under these settings, GARP produced a total of 420 models in which all cells are predicted either present (1) or absent (0). We then used the Summation feature in the Raster Calculator of arcgis to make a final, cumulative predictive map. 1 Layer data used in GARP analyses GIS data layers Source Elevation National elevation data, United States Geological Survey, http://seamless.usgs.gov/ Aspect Derived from elevation data in arcgis Maximum temperature in July WorldClim, http://biogeo.berkeley.edu/worldclim/worldclim.htm Minimum temperature in January WorldClim, http://biogeo.berkeley.edu/worldclim/worldclim.htm Rainfall California Spatial Information Library, United States Geological Survey, http://gis.ca.gov/ Soil texture State Soil Geographic (STATSGO) database, Natural Resources Conservation Service, http://www.ncgc.nrcs.usda.gov/products/datasets/statsgo/ Two additional predictive models were created based on the same museum data. A BIOCLIM model was created using diva – gis version 5.1.0.1 ( Hijmans ., 2001 ). The BIOCLIM model is based on all 19 available climatic variables and a 30 arc‐second resolution. The GLM was created using binary logistic regression analysis in spss version 12.0. GLMs require presence and absence data as input, so random pseudo‐absence points were created in diva – gis . The categorical variables — soil texture and vegetation — could not easily be included in the analysis; elevation and minimum temperature in January, the two most significant of the remaining layers according to the backward stepwise method, were included in the model. The GARP predictive map was used as a guide for field collection efforts during a 1‐week trip to the study area in March 2005 and 5 weeks in May and June 2005. We divided the map values into four classes: predicted absent, and low, medium, and high probability of presence. We attempted to collect spiders randomly throughout the study area, as well as across two transects. We identified two roads that traversed all four probability levels — Stewart Point Skaggs Springs Road in Sonoma County and Orr Springs Road in Mendocino County. Both roads run in a general east–west direction. The GARP model shows the probability of species presence increases, in general, towards the coast. We sampled at regular intervals of six miles along both transects and gathered presence and absence data at each stop. The presence and absence locality data associated with specimens we collected in the field were subsequently used as independent test data to examine the performance of the three models produced above. One measure of accuracy was found using a comparison of each data point with its corresponding GARP model prediction value ( Anderson ., 2002 ). Further statistical assessments were derived from a confusion matrix and include χ 2 and Kappa statistics and commission and omission errors. Receiver operating characteristic (ROC) plots were obtained to provide a threshold‐independent measure of overall accuracy and to compare the predictive ability of the GARP, BIOCLIM, and GLM models. Upon return from our two collecting trips, we created a final GARP model using the same parameters and settings, but based solely on localities where we had collected from March–June 2005 to test whether data which have been formally collected, and hence are less susceptible to sampling bias, would greatly alter the predicted spatial distribution of the spiders. Furthermore, the locality data associated with our fieldwork, obtained through the use of a GPS unit, may be more accurate than some of the historical descriptions associated with the museum specimens. RESULTS The GARP model utilizing the 42 locality data points from museum specimens is shown in Fig. 2(a) . Areas where GARP predicts the highest probability of presence are predominantly in the immediate vicinity of the museum specimen records, but moderate to high levels of probability are also defined in areas where Promyrmekiaphila have not been previously collected, for example, in the northern part of the state near Eureka and the midsection of the Oregon/California border, the eastern slopes surrounding the Central Valley, and the Sutter Buttes — the remnants of a dormant volcano in the middle of the Central Valley. From March–June 2005, we collected Promyrmekiaphila at 80 newly discovered localities and determined that the spiders were absent at 49 different localities ( Fig. 2b ). Of those 129 points, the GARP model predicted 61 cells present with spiders and 68 absent. These values were incorporated into a confusion matrix ( Table 2 ), which was used to calculate various measures of accuracy and error ( Table 3 ). 2 Ecological niche distribution models for Promyrmekiaphila. Tan represents areas of predicted absence; green, light blue, and dark blue represent increasing probability levels of presence. (a) GARP map based on 42 locations from museum collections records; red dots represent museum specimens (scale bar = 35 km). (b–e) Red dots represent material collected from March–June 2005; crosses represent areas of absence: (b) GARP map based on museum localities; (c) GARP map based on recent material; (d) logistic regression map; and (e) BIOCLIM map. (f–g) Close‐up of the predictive GARP models showing collection results along two transects (red dots indicate presence, crosses represent absence; scale bar = 10 km): (f) map based on museum specimens; (g) map based on recently collected material. 2 Confusion matrix created from material collected from March–June 2005 Prediction Observed Present Absent Total Present 41 20 61 Absent 39 29 68 Total 80 49 129 3 Measurements of accuracy derived from the confusion matrix presented in Table 2 Measure Value Commission index 0.408 Omission error 0.488 Kappa 0.097 Results of the χ 2 test demonstrate that the discrepancy between observed and expected frequencies is too large to be attributed to chance alone (χ 2 = 11.23, d.f. = 1, P < 0.001). Because our expected data were taken from the GARP model, the χ 2 test suggests that the proportion of cells actually observed present and absent, respectively, do not match the proportions predicted from the GARP model. Results show moderately high degrees of both extrinsic commission error (false positives) and omission error (false negatives). Out of the 80 points positively identified as present, 39 were mistakenly predicted by the GARP model to be absent. Out of the 49 points identified as absent, 20 were falsely predicted as present. The χ 2 test evaluates GARP model significance but does not actually indicate the accuracy of the model. Although the GARP model was not a significant predictor of Promyrmekiaphila presence, in some instances non‐significant models may nonetheless show a high degree of accuracy ( Anderson ., 2002 ). This discrepancy is due to the difference between comparing proportions in a significance test and comparing each data point with its corresponding GARP prediction value in a test for accuracy ( Anderson ., 2002 ). For our data, only half of the collection points (41 out of 80) fell into areas of predicted presence. Predictivity of absence in the GARP model fared only slightly better, with 29 (out of 49) of the localities determined to be lacking Promyrmekiaphila falling into cells of predicted absence. Along the transects, the observed state of the localities often did not correspond to the predicted state; however, we did detect a general trend of decreasing spider abundance as we travelled eastward — a pattern exhibited in the GARP model. The Kappa statistic is often used as a measure of overall accuracy because it incorporates all of the information contained within the confusion matrix ( Fielding & Bell, 1997 ). Based on the assessment by Landis and Koch (1977 ) that K < 0.4 is poor, the Kappa value calculated from our collection data suggests that the GARP model performed very poorly (K = 0.097). According to the ROC plot ( Fig. 3 ), the area under the curve (AUC) for the GARP model was 0.556, suggesting that using the model is no better than simply guessing or flipping a coin. Alternatively, the BIOCLIM and GLM models ( Fig. 2d–e ) not only displayed greater AUCs than the GARP model, but they were also significant predictors of the spatial distribution of Promyrmekiaphila ( Table 4 ). 3 ROC plots of GARP, BIOCLIM, and GLM models based on museum records, evaluated using recently collected material. 4 Measurements associated with the ROC plots displayed in Fig. 3 Model AUC Std. error Asymptotic sig. BIOCLIM 0.699 0.046 0.000 GARP 0.556 0.053 0.287 LOG_REG 0.662 0.052 0.002 The GARP model based solely on localities where we had collected from March–June 2005 is shown in Fig. 2(c) , and displays an overall effect of range reduction — a further restriction of the area considered to be a suitable habitat, although the model does exhibit an increase in the probability of occurrence in many cells in the immediate vicinity of the collection sites along the transects, as would be expected ( Fig. 2f–g ). However, we suspect that the environmental conditions are similar to the north and south of the transects — areas where GARP showed only a moderate increase in the number of cells of predicted presence, even after the addition of the March–June data. The addition of the new data did convert the areas of moderate probability of species occurrence near the eastern end of the transects to areas of predicted absence. Although no true absence data were employed in the GARP models, we did not find Promyrmekiaphila in those locations. Locality data for Promyrmekiaphila collected from March–June 2005 can be accessed at http://www.mygalomorphae.org . DISCUSSION Compared to the other more computationally simplistic approaches (BIOCLIM and GLM) examined for our data, the GARP‐produced model failed miserably in its ability to accurately reconstruct the distribution of Promyrmekiaphila throughout northern and central California. Because the BIOCLIM and GLM models were produced using a slightly different set of predictive variables, conclusions that can be drawn from comparisons to the GARP model are obviously limited. Nevertheless, these simpler models are not as computationally intensive and are much easier to produce — qualities that would be attractive to systematists not specializing in spatial modelling. We attribute the failure of the GARP model to a number of issues related to the restricted spatial distribution of the study taxon and other issues related to the opaque nature of the GARP algorithm. Primarily, resolution, or spatial scale, and predictor variables are two components that must be prudently chosen when constructing any distributional model. Organisms, like Promyrmekiaphila, with low vagility and poor dispersal capabilities may further complicate these decisions due to their inherent isolation. Problems associated with resolution of spatial data in GIS analyses can be particularly pervasive when parcels of suitable habitat are disjunct. The area that a small, non‐vagile species inhabits may be only a few meters or less; thus, finer resolution usually provides better predictive ability in models ( Guisan & Thuiller, 2005 ). Thus, an increase in data resolution is one of the primary factors necessary to increase model prediction accuracy, particularly for areas with microtopographic variation ( Guisan & Zimmermann, 2000 ). Such a complex landscape defines our study area in northern California. Guisan and Thuiller (2005 ) argue that, for sessile organisms, not only must the combination of all suitable conditions be present within a cell, but they must all be present at the same specific location within the cell. This requirement probably extends to highly non‐vagile species as well. The ecological GIS data sets (e.g. layers) used for modelling are available at various resolutions, but even the highest resolution data sets can belie the heterogeneity of the landscape. Consequently, modelling programs tend to identify large tracts as suitable (or unsuitable) although the habitat is actually heterogeneous ( Cowley ., 1999 ). Aside from issues of scale, which can overlook the isolated nature of suitable habitat, isolation clearly presents other difficulties especially for non‐vagile species. This difficulty stems from the fact that the actual species range can potentially be overpredicted. Ecological niche models define areas that are ecologically suitable for a species, but many factors may be responsible for the absence of the species in such an area. Biotic interactions (e.g. predators or competitors) may have precluded a species from an otherwise suitable area. Historical geological factors may have hampered dispersal to certain areas and potentially represent a severe limiting factor in predictive models because they are not accounted for in the model ( Guisan & Zimmermann, 2000 ). Or perhaps, the species has gone extinct as the result of natural events or, more likely, of recent human activities. All of the aforementioned factors intersect to form the species realized niche ( Soberón & Peterson, 2005 ). That said, it is important to note that there is some debate about whether spatial distribution models define a species fundamental niche or realized niche. Guisan and Thuiller (2005 ) state that most of the literature assumes, without proper evidence, that spatial models represent the realized niche of the species, because their observed distributions are already constrained by biotic interactions and limiting resources. A limited degree of information regarding biotic interactions and accessibility (a function of historical geological factors and dispersal capabilities) may be introduced indirectly into GARP via the pseudo‐absences ( Soberón & Peterson, 2005 ). The realized niche may be significantly smaller than the fundamental niche if the degree and influence of biotic interactions are large. For Promyrmekiaphila , we suspect that biotic interactions do not play an overwhelming role in delimiting distribution. These spiders are generalist predators known to occur frequently alongside other burrowing spiders, and they themselves have few predators, primarily parasitic pompilid wasps, which, based on extensive field observations, do not decimate populations. Accessibility to suitable habitat, however, is expected to be a major factor that severely constrains the realized niche as a much smaller portion of the fundamental niche due to the fossorial nature and limited dispersal capabilities of these spiders. In addition, the genus may not have had time to reach a state of equilibrium with its environment ( Guisan & Thuiller, 2005 ). Animals with adequate dispersal capabilities are often able to reach isolated patches of suitable habitat, but highly non‐vagile species are more constrained, and even in relatively continuous areas of suitable habitat, they may not have been present long enough to have dispersed throughout the area. Although some work has been done with plants which incorporates dispersal capabilities into the model ( Iverson ., 1999 ; Dullinger ., 2004 ), we know of no GIS layers that could incorporate this information; therefore, some degree of overprediction is expected. Overprediction, also called commission error or false positives, is desirable to some extent. Whereas areas predicted as absent may be falsified by the discovery of the species at that location, areas designated as absent of spiders can never truly be proved as such, even after extensive searching. For this reason, the commission error is often referred to, more correctly, as a commission index. Extremely low commission index means the data have been overfitted — essentially, the model only predicts the species to occur in and around the original presence localities used as training points in the model. Areas of overprediction are, theoretically, places of suitable habitat. Although all of the areas of predicted presence will not be occupied, these fragments of predicted occurrence can potentially be used to test evolutionary hypotheses regarding vicariant speciation. For instance, some areas of intermediate habitat may have acted as corridors between areas of currently occupied habitat. Overprediction has also been used to predict the potential distributions of invasive species including several invertebrate species ( Ganeshaiah ., 2003 ; Soberon ., 2001 ; Roura‐Pascual ., 2005 ). However, one of the defining characteristics of any invasive species is the ability to disperse easily and widely ( Richardson ., 2000 ). For any organism, dispersal from one area of suitable habitat to another may be prevented by unfavourable environmental conditions in the intermediate area which act as ecological barriers. However, overprediction becomes a major factor with non‐vagile species because the degree of isolation becomes increasingly larger as the vagility of an organism decreases. This may result in too many false positives, in which case, the model is rendered a failure. Many of the areas determined to be overpredicted as a result of our recent collection efforts in the field did not match our expectations of suitable habitat. We made numerous stops in Siskiyou County, at the northern reaches of the study area, from which Promyrmekiaphila is not known, but predicted to occur by the GARP model. The absence of Promyrmekiaphila at those locations could be due to the inability of these spiders to disperse over great distances, or the presence of uninhabitable niches lying intermediate between the areas in Siskiyou and the northernmost area from which they are known (near Redding, Shasta County). However, because the areas in Siskiyou County where the GARP model predicted high probabilities of occurrence appeared to us to be too rocky and lacking in the amount of soil necessary to constitute suitable Promyrmekiaphila habitat, we suspect that these overpredicted areas are an artefact of model failure, due perhaps to the lack of necessary soil information contained in the environmental layers. Not including the factors critical to delimiting species distributions may cause GARP models to fail. The underlying premise of these models is that predictable relationships exist between the occurrence of a species and the environment. However, for many species, especially fossorial invertebrate taxa, very little is known with regards to their habitat requirements. This is the case for Promyrmekiaphila , a genus for which very little habitat data have been published. For our analyses we selected environmental parameters that we believe play an important role in delimiting the spatial distribution of these spiders. However, it is possible that a critical environmental factor was omitted. Guisan and Thuiller (2005 ) suggest that fine‐scale resolution, as exhibited in Promyrmekiaphila , is controlled by a patchy distribution of resources that are assimilated by the species, such as water. Because these organisms are sedentary burrowers, it is possible that specific soil features that were not captured in our soil data are critical. The addition of environmental layers such as water holding capacity of the soil may improve the model. Outside of this study, most of our collecting efforts are focused on areas that appear to offer suitable habitat. In contrast, during our fieldwork in 2005, we directed much of our collecting efforts to areas of predicted absence because the degree to which our collecting efforts in areas of predicted absence conform to the model provides us with a gauge of model accuracy ( Anderson ., 2003 ). Although the vast majority of the absence points fell into cells of predicted absence, the GARP models did exhibit a high degree of omission error — almost half of the total points identified as present upon observation had been misidentified by the GARP model as absent. Although the relative costs of commission and omission errors will vary according to the particular aim of the model, omission error is particularly egregious when using distribution models in the determination of areas for conservation consideration, because the omission of areas that actually contain populations may make the species susceptible to the loss of genetic diversity ( Fielding , 1997 ). Stockwell and Peterson (2002 ) suggest that too many classes within an environmental layer may result in high omission error. One way to minimize this could be to combine some of the classes within the categorical layers. Another potential problem with GARP modelling occurs when the phylogeny of the study organism is unresolved. Currently, three nominal sibling species compose the genus Promyrmekiaphila , but the correct taxonomic position of these species is unknown. If the species are closely related sister taxa, speciation arose via vicariance, and the niche is conserved, then the model should predict all species equally well. Conversely, if the niche is not conserved, then it may not perform well. In our study there are three areas that differ significantly from the redwood and mixed hardwood forests from which most of the specimens are known. This might also suggest areas of secondary contact where the predicted range between two species overlaps. In the future, generalized additive models, GAM, could be used to generate a model from presence‐only data and computer‐generated pseudo‐absences. This approach, although not as accurate as using true presence/absence data, has been shown to provide significantly predictive models ( Ferrier & Watson, 1997 ; Zaniewski ., 2002 ) with results that are easy to read and easy to interpret ( Guisan & Thuiller, 2005 ). This approach could provide a useful alternative to GARP when the spatial distribution of a species is desired, but exhaustive sampling is not feasible. CONCLUSION A GARP model created from historical museum collections records of a genus of trapdoor spiders proved to be an unreliable guide for collecting in the field and did not provide an adequate representation of their true present‐day spatial distribution. We suspect that the extremely sedentary nature of trapdoor spiders and absence of significant environmental layers are the primary culprits causing the inaccuracy of the GARP model produced in our study. These spiders do not require large tracts of suitable habitat, and the small patches where they are often located are too fine‐scale to be represented in the currently available environmental data sets. The GARP model exhibits a large degree of overprediction, somewhat expected and not necessarily a negative attribute of the models, as discussed earlier. However, despite the large degree of overprediction, the GARP model failed to predict spiders in several large‐scale areas where spiders were present. For organisms of low vagility, most currently available data sets are at a resolution too course to provide accurate predictions. GARP models may perform better at a courser resolution; however, this would provide only an ‘extent of occurrence’ map. These maps represent a gross geographical scale and, because the organism is unlikely to inhabit all areas within the extent of occurrence, primarily delineate the extreme boundaries of the distributional range. A more realistic depiction of the actual range are represented by ‘areas of occupancy’, a network of sometimes disjunct patches of habitat of various levels of suitability ( Gaston, 1994 ). GARP models have previously been used to characterize the distribution of more vagile taxa and are alleged to be useful in predicting future distributions of invasive species, as well as in modelling potential habitat translocations due to climate change. However, based on the results of our study, our comparisons to other modelling approaches, and the current unavailability of data at an appropriate resolution, we do not recommend that GARP be used as an approach to modelling species distributions of non‐vagile trapdoor spiders and suspect that this recommendation should be extended, at a minimum, to all non‐vagile invertebrate species. Moreover, we do not recommend GARP as a first approach to modelling species distributions for systematists because, as demonstrated, models generated by simpler methods with fewer variable types perform better. GARP, unlike BIOCLIM or GLM, is more difficult to implement, is computationally intensive, and, because it is a ‘black box’ approach, algorithm outcomes are virtually impossible to interpret and troubleshoot. ACKNOWLEDGEMENTS Brent Hendrixson and three anonymous reviewers made useful comments on an earlier draft of this paper. This work was supported by National Science Foundation Grant DEB 0315160.

Journal

Diversity and DistributionsWiley

Published: Jan 1, 2006

There are no references for this article.