Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Cross-Validation With Confidence

Cross-Validation With Confidence Abstract Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the American Statistical Association Taylor & Francis

Cross-Validation With Confidence

Journal of the American Statistical Association , Volume 115 (532): 20 – Dec 11, 2020

Cross-Validation With Confidence

Journal of the American Statistical Association , Volume 115 (532): 20 – Dec 11, 2020

Abstract

Abstract Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

Loading next page...
 
/lp/taylor-francis/cross-validation-with-confidence-uYIZ4Lc033

References (46)

Publisher
Taylor & Francis
Copyright
© 2019 American Statistical Association
ISSN
1537-274X
eISSN
0162-1459
DOI
10.1080/01621459.2019.1672556
Publisher site
See Article on Publisher Site

Abstract

Abstract Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

Journal

Journal of the American Statistical AssociationTaylor & Francis

Published: Dec 11, 2020

Keywords: Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection

There are no references for this article.