Purpose– The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities. The purpose of this paper is to propose a new cluster validity index (ARSD index) that works well on all types of data sets. Design/methodology/approach– The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster. A novel penalty function is proposed for determining the distinctness measure of clusters. Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index. The values of the six indices are computed for all nc ranging from (ncmin, ncmax) to obtain the optimal number of clusters present in a data set. The data sets used in the experiments include shaped, Gaussian-like and real data sets. Findings– Through extensive experimental study, it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices. This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results. Originality/value– The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.
International Journal of Intelligent Computing and Cybernetics – Emerald Publishing
Published: Jun 13, 2016