Summary of the SIGIR 2003 Workshop on Defining Evaluation Methodologies for Terabyte-Scale Test Collections Ian Soboroff and Ellen Voorhees National Institute of Standards and Technology Gaithersburg, MD ian .soboroff@nist .gov Nick .CraswelAcsiro .au Nick Craswell CSIRO Canberra, ACT, Australia Introduction Early retrieval test collections were small, allowing relevance judgments to be based on an exhaustive examination of the documents, but limiting the general applicability of the findings . Karen Sparck Jones and Keith van Rijsbergen proposed a way of building significantly larger test collections by using pooling, a procedure adopted and subsequently validated by TREC . Now TREC-sized collections (several gigabytes of text and a few million documents) are small for some realistic tasks, but current po$ling practices do not scale to substantially larger document sets . This article summarizes a workshop held at SIGIR 2003-in Toronto, Canada, the goal of which was to develop an evaluation methodology for terabyte-scale document collections . The outcome of the workshop was a proposal for a new TREC track to investigate ad hoc retrieval on a collection of 100M web pages. In particular, we began by assuming the existence of a collection of several hundred million web pages, and discussed methods
/lp/association-for-computing-machinery/summary-of-the-sigir-2003-workshop-on-defining-evaluation-CFt8w10p4m