Purpose – The purpose of this paper is to propose a framework for describing and evaluating the representativeness of a small set of search results extracted from the original results: this is deemed desirable in information retrieval in enterprise information systems. Design/methodology/approach – The paper proposes a combined measure, namely RF β , to evaluate the extracted small set in terms of the notions of coverage and redundancy. Data experiments were conducted on three different extraction strategies to evaluate the representativeness, i.e. coverage and redundancy. Findings – Both from intuitive and experimental perspectives, the proposed coverage measure, redundancy measure and RF β measure could effectively evaluate the representativeness. Research limitations/implications – The search results, e.g. in the form of documents and texts, are modeled using a vector space model and cosine similarity. Semantic models and linguistic models could be further introduced into this research to improve the proposed measures. Practical implications – With the rapidly growing need for information retrieval in enterprise information systems, the representativeness of search results become more desirable and important for search engine users. The well‐designed representativeness measures will help them achieve satisfactory results. Originality/value – The originality of the paper lies in the definition of representativeness of a small set of search results extracted from the original results. This focuses on the two aspects of coverage rate and redundancy rate both from intuitive and experimental perspectives.
Journal of Enterprise Information Management – Emerald Publishing
Published: Jul 26, 2011
Keywords: Information retrieval; Representativeness; Coverage; Redundancy; Set theory; Information systems