Exploiting Clustering and Phrases for Context-Based Information Retrieval Peter G. Anick Digital Equipment Corporation, Stow, MA. and Brandeis University, Waltham, MA. Anick@mail.dec.com and Shivakumar Vaithyanathan IBM Almaden Research Center, San Jose, CA. Shiv@almaden.ibm.com Abstract This paper explores exploiting the synergy between document clustering and phrasal analysis for the purpose of automatically constructing a corrrex~-busedretrieval system. A contex~ consists of two components - a cluster of logically related articles (its exrension) and a small set of salient concepts, represented by words and phrases and organized by the cluster s key terms (its irr~ertsion). At inn-time, the system presents contexts that best match the result list of a user s natural language query. The user can then choose a context and manipulate the intensionsd component to both browse the context s extension and launch new searches over the entire database. We argue that the focused relevance feedback provided by contexts, at a level of abstraction higher than individual documents and lower than the database as a whole, provides a natural way for users to refine vague information needs and helps to blur the distinction between searching and browsing. The I%zraphrase interface, running over a database of business-related news articles, is
/lp/association-for-computing-machinery/exploiting-clustering-and-phrases-for-context-based-information-4uCN2n1oKg