Workshop on Language Modeling and Information Retrieval May 31-June 1 2001 Carnegie Mellon University Pittsburgh, Pennsylvania, USA The language modeling approach to information retrieval (IR) is a new framework that has been proposed and developed within the past five years, although its roots in the IR literature go back more than twenty years. Research carried out at a number of sites has confirmed that the language modeling approach is a theoretically attractive and potentially very effective probabilistic framework for building IR systems. The central computational device in this framework is a language model - a probabilistic model for generating natural language text. The most familiar and basic language models are simply unigram word models, built in terms of the relative frequencies of the words appearing in a document. More sophisticated language models account for word order, phrases, and the change in language statistics in time and across document collections. The use of language models is attractive for several reasons. For example, building an IR system using language models allows us to reason about the design and empirical performance of the system in a principled way, using the tools of probability theory. In addition, we can leverage the
/lp/association-for-computing-machinery/workshop-on-language-modeling-and-information-retrieval-ObiiI9hdEe