Exploration of Text Collections with Hierarchical Feature Maps Dieter Merkl * Department of Computer Science Royal Melbourne Institute of Technology 723 Swanston St., Carlton, VIC 3053, Austraha dieter@ mds.rmit.edu.au Abstract Document classification is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classification techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classification task. Each of the individual self-organizing maps is trained independently and gets specialized to a subset of the input data. As a consequence, the choice of this particular artificial neural network model enables the tme establishment of a document taxonomy. The benefit of this approach is a straightforward representation of document similarities combkd with drantatically reduced training time. In particular, the hierarchical representation of document collections is appealing because it is the underlying organizational principle in use by librarians providing the necessary familiarity for the user. The massive reduction in the time needed to train the artificial neural network together with its highly accurate clustering
/lp/association-for-computing-machinery/exploration-of-text-collections-with-hierarchical-feature-maps-WxfUTHw0jY