DOI: 10.1007/s00357-017- -
rCOSA: A Software Package for Clustering Objects
on Subsets of Attributes
Maarten M. Kampert
Leiden University, The Netherlands
Jacqueline J. Meulman
Leiden University, The Netherlands and Stanford University, USA
Jerome H. Friedman
Stanford University, USA
Abstract: rCOSA is a software package interfaced to the R language. It im-
plements statistical techniques for clustering objects on subsets of attributes
in multivariate data. The main output of COSA is a dissimilarity matrix that
one can subsequently analyze with a variety of proximity analysis methods.
Our package extends the original COSA software (Friedman and Meulman,
2004) by adding functions for hierarchical clustering methods, least squares
multidimensional scaling, partitional clustering, and data visualization. In
the many publications that cite the COSA paper by Friedman and Meulman
(2004), the COSA program is actually used only a small number of times.
This can be attributed to the fact that this original implementation is not
very easy to install and use. Moreover, the available software is out-of-date.
Here, we introduce an up-to-date software package and a clear guidance for
this advanced technique. The software package and related links are avail-
able for free at: https://github.com/mkampert/rCOSA.
Keywords: Distance-based clustering; Subsets of variables; Feature se-
lection; Targeted clustering; Mixtures of numeric and categorical variables;
Clustering in R; Multidimensional scaling; Proximities; Dissimilarities; Omics
Corresponding Author’s Address: M.M. Kampert, Mathematical Institute,
Leiden University, Niels Bohrweg 1, 23333 CA Leiden, email: mkampert@math.
Published online : 3 November 2017
Journal of Classification 3 2017)4: -514 547 (