journal article
Open Access Collection
Evolutionary Methods for Unsupervised Feature Selection Using Sammon's Stress Function
Saxena, Amit; Pal, Nikhil R.; Vora, Megha
2010 Fuzzy Information and Engineering
doi: 10.1007/s12543-010-0047-4
AbstractIn this paper, four methods are proposed for feature selection in an unsupervised manner by using genetic algorithms. The proposed methods do not use the class label information but select a set of features using a task independent criterion that can preserve the geometric structure (topology) of the original data in the reduced feature space. One of the components of the fitness function is Sammon's stress function which tries to preserve the topology of the high dimensional data when reduced into the lower dimensional one. In this context, in addition to using a fitness criterion, we also explore the utility of unfitness criterion to select chromosomes for genetic operations. This ensures higher diversity in the population and helps unfit chromosomes to become more fit. We use four different ways for evaluation of the quality of the features selected: Sammon error, correlation between the inter-point distances in the two spaces, a measure of preservation of cluster structure found in the original and reduced spaces and a classifier performance. The proposed methods are tested on six real data sets with dimensionality varying between 9 and 60. The selected features are found to be excellent in terms of preservation topology (inter-point geometry), cluster structure and classifier performance. We do not compare our methods with other methods because, unlike other methods, using four different ways we check the quality of the selected features by finding how well the selected features preserve the “structure” of the original data.