Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A THEORETICAL BASIS FOR THE USE OF COOCCURRENCE DATA IN INFORMATION RETRIEVAL

A THEORETICAL BASIS FOR THE USE OF COOCCURRENCE DATA IN INFORMATION RETRIEVAL This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system. Its main concern is with the weighting of index terms as a device for increasing retrieval effectiveness. Previously index terms have been assumed to be independent for the good reason that then a very simple weighting scheme can be used. In reality index terms are most unlikely to be independent. This paper explores one way of removing the independence assumption. Instead the extent of the dependence between index terms is measured and used to construct a nonlinear weighting function. In a practical situation the values of some of the parameters of such a function must be estimated from small samples of documents. So a number of estimation rules are discussed and one in particular is recommended. Finally the feasibility of the computations required for a nonlinear weighting scheme is examined. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Documentation Emerald Publishing

A THEORETICAL BASIS FOR THE USE OF COOCCURRENCE DATA IN INFORMATION RETRIEVAL

Journal of Documentation , Volume 33 (2): 14 – Feb 1, 1977

Loading next page...
 
/lp/emerald-publishing/a-theoretical-basis-for-the-use-of-cooccurrence-data-in-information-SVkkeoRtic
Publisher
Emerald Publishing
Copyright
Copyright © Emerald Group Publishing Limited
ISSN
0022-0418
DOI
10.1108/eb026637
Publisher site
See Article on Publisher Site

Abstract

This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system. Its main concern is with the weighting of index terms as a device for increasing retrieval effectiveness. Previously index terms have been assumed to be independent for the good reason that then a very simple weighting scheme can be used. In reality index terms are most unlikely to be independent. This paper explores one way of removing the independence assumption. Instead the extent of the dependence between index terms is measured and used to construct a nonlinear weighting function. In a practical situation the values of some of the parameters of such a function must be estimated from small samples of documents. So a number of estimation rules are discussed and one in particular is recommended. Finally the feasibility of the computations required for a nonlinear weighting scheme is examined.

Journal

Journal of DocumentationEmerald Publishing

Published: Feb 1, 1977

There are no references for this article.