Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that is, words whose occurrences in a text are strongly connected with the presence of a given topic. Each of these classes is associated with one of the main topics of the corpus, and can be used to detect the presence of that topic in any of its paragraphs, by a simple keyword co-occurrence criterion. The classes are extracted from the textual data in a fully automatic way, without requiring any a priori linguistic knowledge or making any assumptions about the topics to search for. The algorithms we have developed allow us to yield satisfactory and directly usable results despite the amount of noise inherent in textual data. That goal is reached thanks to a combination of several data analysis techniques. On a corpus of archives from the French monthly newspaper Le Monde Diplomatique, we obtain 40 classes of about 30 words each that accurately characterize precise topics, and allow us to detect their occurrences with a precision and recall of 85% and 65% respectively.
Intelligent Data Analysis – IOS Press
Published: Jan 1, 2005
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.