Guessing Morphology from Terms and Corpora Christian Jacquemin Institut de Recherche en Informatique de Nantes 2, chemin de la Houssini&e - BP 92208 44322 NANTES Cedex 3, FRANCE Chrktian.Jacquemirt@ irin.univ-nantes.fr Abstract This study proposes an algorithm for automatically acquiring morphological Iinks between words. This algorithm relies on the concurrent use of a corpus and a list of multi-word terms, and does not require any prior linguistic knowledge. The four steps of the algorithm are (1) single-word truncation, (2) conflation of multi-word terms, (3) classification and filtering, and (4) clustering of contiation clasea. At each step a precise evaluation is performed in order to chose the optimal parameters. The final results indicate a clustering of 45% of the classes with a prectilon of 87Y0. The derivational knowledge acquired through this method can be used for conceiving a domain-oriented stemmer for scientific and technical corpora. 1 Introduction of data are geared towards a better understanding of lanWW, not tow=ds ~. Several problems remain= the exploitation of derivational linguistics in lR q concer~ How can we know whether two similar words are morphologically related? Imported and important look very similar but have no synchronic meaning in common. On the contrary, gifl
/lp/association-for-computing-machinery/guessing-morphology-from-terms-and-corpora-TaULfGh7pS