Evaluating presence–absence models in ecology: the need to account for prevalence

Evaluating presence–absence models in ecology: the need to account for prevalence 1 Models for predicting the distribution of organisms from environmental data are widespread in ecology and conservation biology. Their performance is invariably evaluated from the percentage success at predicting occurrence at test locations. 2 Using logistic regression with real data from 34 families of aquatic invertebrates in 180 Himalayan streams, we illustrate how this widespread measure of predictive accuracy is affected systematically by the prevalence (i.e. the frequency of occurrence) of the target organism. Many evaluations of presence–absence models by ecologists are inherently misleading. 3 With the same invertebrate models, we examined alternative performance measures used in remote sensing and medical diagnostics. We particularly explored receiver‐operating characteristic (ROC) plots, from which were derived (i) the area under each curve (AUC), considered an effective indicator of model performance independent of the threshold probability at which the presence of the target organism is accepted, and (ii) optimized probability thresholds that maximize the percentage of true absences and presences that are correctly identified. We also evaluated Cohen's kappa, a measure of the proportion of all possible cases of presence or absence that are predicted correctly after accounting for chance effects. 4 AUC measures from ROC plots were independent of prevalence, but highly significantly correlated with the much more easily computed kappa. Moreover, when applied in predictive mode to test data, models with thresholds optimized by ROC erroneously overestimated true occurrence among scarcer organisms, often those of greatest conservation interest. We advocate caution in using ROC methods to optimize thresholds required for real prediction. 5 Our strongest recommendation is that ecologists reduce their reliance on prediction success as a performance measure in presence–absence modelling. Cohen's kappa provides a simple, effective, standardized and appropriate statistic for evaluating or comparing presence–absence models, even those based on different statistical algorithms. None of the performance measures we examined tests the statistical significance of predictive accuracy, and we identify this as a priority area for research and development. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Applied Ecology Wiley

Evaluating presence–absence models in ecology: the need to account for prevalence

Loading next page...
 
/lp/wiley/evaluating-presence-absence-models-in-ecology-the-need-to-account-for-NiLR5vJHUL
Publisher site
See Article on Publisher Site

Abstract

1 Models for predicting the distribution of organisms from environmental data are widespread in ecology and conservation biology. Their performance is invariably evaluated from the percentage success at predicting occurrence at test locations. 2 Using logistic regression with real data from 34 families of aquatic invertebrates in 180 Himalayan streams, we illustrate how this widespread measure of predictive accuracy is affected systematically by the prevalence (i.e. the frequency of occurrence) of the target organism. Many evaluations of presence–absence models by ecologists are inherently misleading. 3 With the same invertebrate models, we examined alternative performance measures used in remote sensing and medical diagnostics. We particularly explored receiver‐operating characteristic (ROC) plots, from which were derived (i) the area under each curve (AUC), considered an effective indicator of model performance independent of the threshold probability at which the presence of the target organism is accepted, and (ii) optimized probability thresholds that maximize the percentage of true absences and presences that are correctly identified. We also evaluated Cohen's kappa, a measure of the proportion of all possible cases of presence or absence that are predicted correctly after accounting for chance effects. 4 AUC measures from ROC plots were independent of prevalence, but highly significantly correlated with the much more easily computed kappa. Moreover, when applied in predictive mode to test data, models with thresholds optimized by ROC erroneously overestimated true occurrence among scarcer organisms, often those of greatest conservation interest. We advocate caution in using ROC methods to optimize thresholds required for real prediction. 5 Our strongest recommendation is that ecologists reduce their reliance on prediction success as a performance measure in presence–absence modelling. Cohen's kappa provides a simple, effective, standardized and appropriate statistic for evaluating or comparing presence–absence models, even those based on different statistical algorithms. None of the performance measures we examined tests the statistical significance of predictive accuracy, and we identify this as a priority area for research and development.

Journal

Journal of Applied EcologyWiley

Published: Jan 1, 2001

Keywords: ; ; ; ; ; ;

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off