Access the full text.
Sign up today, get DeepDyve free for 14 days.
A. Broder, Arthur Ciccolo (2004)
Towards the next generation of enterprise search technologyIBM Syst. J., 43
D. Hawking
Challenges in Enterprise Search, ACM International Conference Proceedings Series, Dunedin, New Zealand
Feng Pan, Wei Wang, A. Tung, Jiong Yang (2005)
Finding representative set from massive dataFifth IEEE International Conference on Data Mining (ICDM'05)
J. Grabmeier, A. Rudolph (2002)
Techniques of Cluster Algorithms in Data MiningData Mining and Knowledge Discovery, 6
Ronald Fagin, A. Lotem, M. Naor (2001)
Optimal aggregation algorithms for middlewareJ. Comput. Syst. Sci., 66
C.J. van Rijsbergen
Information Retrieval
D. Kraft, A. Bookstein (1978)
Evaluation of information retrieval systems: A decision theory approachJ. Am. Soc. Inf. Sci., 29
M. Kantardzic (2011)
Data-Mining Concepts
Y. Li, Zijian Zheng, H. Dai (2005)
KDD CUP-2005 report: facing a great challengeSIGKDD Explor., 7
J. Carbonell, Jade Goldstein-Stewart (1998)
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Lawrence Page, S. Brin, R. Motwani, T. Winograd (1999)
The PageRank Citation Ranking : Bringing Order to the Web, 98
(1979)
Information Retrieval, Butterworths, London
Nicolas Bruno, S. Chaudhuri, L. Gravano (2002)
Top-k selection queries over relational databases: Mapping strategies and performance evaluationACM Trans. Database Syst., 27
D. Hawking (2004)
Challenges in Enterprise Search
R. Baeza‐Yates, B. Ribeiro‐Neto
Modern Information Retrieval
Man Yiu, N. Mamoulis (2009)
Multi-dimensional top-k dominating queriesThe VLDB Journal, 18
C. Buckley, E.M. Voorhees
Evaluating evaluation measure stability
G. Salton (1971)
The SMART Retrieval System—Experiments in Automatic Document Processing
Ming Hua, J. Pei, A. Fu, Xuemin Lin, Ho-fung Leung (2009)
Top-k typicality queries and efficient query answering methods on large databasesThe VLDB Journal, 18
S. Robertson (1997)
The probability ranking principle in IR
Claudio Carpineto, S. Osinski, Giovanni Romano, Dawid Weiss (2009)
A survey of Web clustering enginesACM Comput. Surv., 41
A. Huang (2008)
Similarity Measures for Text Document Clustering
Xiang Lian, Lei Chen (2009)
Top-k dominating queries in uncertain databases
T. Sakai (2007)
On the reliability of information retrieval metrics based on graded relevanceInf. Process. Manag., 43
Jiawei Han, M. Kamber (2000)
Data Mining: Concepts and Techniques
D. Papadias, Yufei Tao, Greg Fu, B. Seeger (2005)
Progressive skyline computation in database systemsACM Trans. Database Syst., 30
K. Balog (2007)
People search in the enterprise
I. Ilyas, G. Beskales, Mohamed Soliman (2008)
A survey of top-k query processing techniques in relational database systemsACM Comput. Surv., 40
D. Ruppert (2004)
The Elements of Statistical Learning: Data Mining, Inference, and PredictionJournal of the American Statistical Association, 99
Yi Zhang, Jamie Callan, T. Minka (2002)
Novelty and redundancy detection in adaptive filtering
Michael Gordon, P. Lenk (1991)
A utility theoretic examination of the probability ranking principle in information retrievalJ. Am. Soc. Inf. Sci., 42
Ying Zhao, G. Karypis, U. Fayyad (2005)
Hierarchical Clustering Algorithms for Document DatasetsData Mining and Knowledge Discovery, 10
N. Mamoulis, Man Yiu, K. Cheng, D. Cheung (2007)
Efficient top-k aggregation of ranked inputsACM Trans. Database Syst., 32
Mingjie Zhu, Shuming Shi, Mingjing Li, Ji-Rong Wen (2009)
Effective top-k computation with term-proximity supportInf. Process. Manag., 45
Nicolas Bruno, L. Gravano, A. Marian (2002)
Evaluating top-k queries over Web-accessible databasesProceedings 18th International Conference on Data Engineering
B. Liu (2006)
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
L. Page, S. Brin, R. Motwani, T. Winograd
The Pagerank Citation Ranking: Bringing Order to the Web, Technical Report
Ulrich Güntzer, Wolf-Tilo Balke, Werner Kießling (2000)
Optimizing Multi-Feature Queries for Image Databases
X. Tang, Guoqing Chen, Q. Wei (2009)
Introducing Relation Compactness for Generating a Flexible Size of Search Results in Fuzzy Queries
N. Bruno, S. Chaudhri, L. Gravand
Top‐ k selection queries over relational databases
Anteneh Ayanso, Paulo Góes, K. Mehta (2009)
A Cost-Based Range Estimation for Mapping Top-k Selection Queries over Relational DatabasesJ. Database Manag., 20
Tuomo Korenius, J. Laurikkala, M. Juhola (2007)
On principal component analysis, cosine and Euclidean measures in information retrievalInf. Sci., 177
A. Spink (2011)
Web Search: Public Searching of the Web
Ying Zhao, G. Karypis (2004)
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document ClusteringMachine Learning, 55
Anil Jain, M. Murty, P. Flynn (1999)
Data clustering: a reviewACM Comput. Surv., 31
S. Ceri, A. Bozzon, Marco Brambilla, Emanuele Valle, P. Fraternali, S. Quarteroni (2013)
An Introduction to Information Retrieval
ChengXiang Zhai, William Cohen, J. Lafferty (2003)
Beyond independent relevance: methods and evaluation metrics for subtopic retrievalProceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
R. Aliguliyev (2009)
Clustering of document collection - A weighting approachExpert Syst. Appl., 36
Purpose – The purpose of this paper is to propose a framework for describing and evaluating the representativeness of a small set of search results extracted from the original results: this is deemed desirable in information retrieval in enterprise information systems. Design/methodology/approach – The paper proposes a combined measure, namely RF β , to evaluate the extracted small set in terms of the notions of coverage and redundancy. Data experiments were conducted on three different extraction strategies to evaluate the representativeness, i.e. coverage and redundancy. Findings – Both from intuitive and experimental perspectives, the proposed coverage measure, redundancy measure and RF β measure could effectively evaluate the representativeness. Research limitations/implications – The search results, e.g. in the form of documents and texts, are modeled using a vector space model and cosine similarity. Semantic models and linguistic models could be further introduced into this research to improve the proposed measures. Practical implications – With the rapidly growing need for information retrieval in enterprise information systems, the representativeness of search results become more desirable and important for search engine users. The well‐designed representativeness measures will help them achieve satisfactory results. Originality/value – The originality of the paper lies in the definition of representativeness of a small set of search results extracted from the original results. This focuses on the two aspects of coverage rate and redundancy rate both from intuitive and experimental perspectives.
Journal of Enterprise Information Management – Emerald Publishing
Published: Jul 26, 2011
Keywords: Information retrieval; Representativeness; Coverage; Redundancy; Set theory; Information systems
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.