Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Importance of Proper Testing of Predictor Performance

The Importance of Proper Testing of Predictor Performance Computational predictors are increasingly used for interpretation of variation effects. The majority of novel tools are based on machine learning, methods that are trained to distinguish cases based on known examples. Performance is dependent on the data and the method implementation. In this issue, Grimm et al. (Hum Mutat 36:513‐523, 2015) evaluated systematically several tools and found that the performance of some methods drops when tested on independent data. As a cause, they indicate two types of circularity: type 1, due to using the same data for training and testing, and type 2, which occurs when many proteins contain just one type of variant, either pathogenic or neutral. If a method is trained and tested on the same data, it can be optimized for high performance. Machine learning methods use features to characterize the space of an investigated property. Features too specific for the training data reduce the method's ability to generalize.Human Mutation published guidelines for reporting and testing these kinds of methods (Hum Mutat 34:275‐282, 2013). One requirement is that the test and training data have to be disjoint. Systematic predictor performance assessment has to be based on benchmarks, datasets with known outcome. Such datasets are available at VariBench http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Human Mutation Wiley

The Importance of Proper Testing of Predictor Performance

Human Mutation , Volume 36 (5) – May 1, 2015

Loading next page...
 
/lp/wiley/the-importance-of-proper-testing-of-predictor-performance-m1m07yxz6k

References (0)

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Wiley
Copyright
Copyright © 2015 Wiley Periodicals, Inc.
ISSN
1059-7794
eISSN
1098-1004
DOI
10.1002/humu.22651
Publisher site
See Article on Publisher Site

Abstract

Computational predictors are increasingly used for interpretation of variation effects. The majority of novel tools are based on machine learning, methods that are trained to distinguish cases based on known examples. Performance is dependent on the data and the method implementation. In this issue, Grimm et al. (Hum Mutat 36:513‐523, 2015) evaluated systematically several tools and found that the performance of some methods drops when tested on independent data. As a cause, they indicate two types of circularity: type 1, due to using the same data for training and testing, and type 2, which occurs when many proteins contain just one type of variant, either pathogenic or neutral. If a method is trained and tested on the same data, it can be optimized for high performance. Machine learning methods use features to characterize the space of an investigated property. Features too specific for the training data reduce the method's ability to generalize.Human Mutation published guidelines for reporting and testing these kinds of methods (Hum Mutat 34:275‐282, 2013). One requirement is that the test and training data have to be disjoint. Systematic predictor performance assessment has to be based on benchmarks, datasets with known outcome. Such datasets are available at VariBench

Journal

Human MutationWiley

Published: May 1, 2015

There are no references for this article.