"Real World" Searching Panel at SIGIR 97 Reported by Michael Lesk The SIGIR panel session on "Real World IR" emphasized content, practicality, and the lack of relevance of most SIGIR papers to the search business. Doug Cutting, Jan Pedersen, Terry Noreault and Matt Koll participated. Doug Cutting works at Excite, a web search engine which indexes 50 million pages, using 50 GB of disk space for its index. Their peak load is hundreds of queries per second, and their revenue comes from advertising at about 2 cents per search. About 5% of their gross revenue is available for search hardware. Once peak-load and redundancy needs are factored in, only about a hundredth of a cent per search remain. This means that the hardware to search the fifty million pages at one second per query cannot cost more than $6,000. Their great need is for fast throughput - they can meet their cost objective only by using cheap features. Thus, they have term suggestions, but not clustering. Many very practical considerations intrude. Users try to fool the search engines by stuff'mg their pages, i.e. adding words believed to be frequently searched for in ways which will not display (in
/lp/association-for-computing-machinery/real-world-searching-panel-at-sigir-97-3ZPEc4bTDe