Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fast and simple dataset selection for machine learning

Fast and simple dataset selection for machine learning AbstractThe task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png at - Automatisierungstechnik de Gruyter

Fast and simple dataset selection for machine learning

at - Automatisierungstechnik , Volume 67 (10): 10 – Oct 25, 2019

Loading next page...
 
/lp/de-gruyter/fast-and-simple-dataset-selection-for-machine-learning-1spYAh0ZKy
Publisher
de Gruyter
Copyright
© 2019 Walter de Gruyter GmbH, Berlin/Boston
ISSN
2196-677X
eISSN
2196-677X
DOI
10.1515/auto-2019-0010
Publisher site
See Article on Publisher Site

Abstract

AbstractThe task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.

Journal

at - Automatisierungstechnikde Gruyter

Published: Oct 25, 2019

There are no references for this article.