Threefold versus fivefold cross-validation and individual versus average data in predictive regression modelling of machining experimental data
AbstractModel selection and validation are critical in predicting the performance of manufacturing processes. Proper selection of variables helps minimize the model mismatch error, proper selection of models helps reduce the model estimation error, and proper validation of models helps minimize the model prediction error. In the current paper, the literature is reviewed and a rigorous procedure is proposed for selection and cross-validation (CV) of predictive regression models. Experimental data from a turning surface roughness study are used to demonstrate how to select and validate predictive regression models. In particular, different data splitting methods are compared, such as fivefold CV versus threefold CV as well as the individual data versus the average data. This paper has revealed no statistical difference between the use of fivefold CV and threefold CV, and the use of the individual and the average data in subset selection and CV of predictive regression models. Consequently, threefold instead of fivefold or tenfold CV and either individual data or average data may be used to reduce the computational cost in predictive regression modelling of experimental data based on this and other similar empirical studies.