Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

Automatic thesaurus for enhanced Chinese text retrieval

Automatic thesaurus for enhanced Chinese text retrieval Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision. This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness. In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent. The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection. These co‐occurrence values are in turn derived from the term and document frequencies of the terms. A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus. Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Library Review Emerald Publishing

Automatic thesaurus for enhanced Chinese text retrieval

Library Review , Volume 49 (5): 11 – Jul 1, 2000

Loading next page...
 
/lp/emerald-publishing/automatic-thesaurus-for-enhanced-chinese-text-retrieval-0aCwc10OIe
Publisher
Emerald Publishing
Copyright
Copyright © 2000 MCB UP Ltd. All rights reserved.
ISSN
0024-2535
DOI
10.1108/00242530010331754
Publisher site
See Article on Publisher Site

Abstract

Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision. This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness. In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent. The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection. These co‐occurrence values are in turn derived from the term and document frequencies of the terms. A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus. Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system.

Journal

Library ReviewEmerald Publishing

Published: Jul 1, 2000

Keywords: Information retrieval; Automation; Data retrieval; China

References