Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automatic new topic identification in search engine transaction logs

Automatic new topic identification in search engine transaction logs Purpose – Content analysis of search engine user queries is an important task, since successful exploitation of the content of queries can result in the design of efficient information retrieval algorithms of search engines, which can offer custom‐tailored services to the web user. Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to address these issues. Design/methodology/approach – This study applies genetic algorithms and Dempster‐Shafer theory, proposed by He et al. , to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals and query reformulation patterns. A sample data log from the Norwegian search engine FAST (currently owned by overture) is selected to apply Dempster‐Shafer theory and genetic algorithms for identifying topic changes in the data log. Findings – As a result, 97.7 percent of topic shifts and 87.2 percent of topic continuations were estimated correctly. The findings are consistent with the previous application of the Dempster‐Shafer theory and genetic algorithms on a different search engine data log. This finding could be implied as an indication that content‐ignorant topic identification, using query patterns and time intervals, is a promising line of research. Originality/value – Studies an important dimension of user behavior in information retrieval. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Internet Research Emerald Publishing

Automatic new topic identification in search engine transaction logs

Internet Research , Volume 16 (3): 16 – May 1, 2006

Loading next page...
 
/lp/emerald-publishing/automatic-new-topic-identification-in-search-engine-transaction-logs-G9X78YnWxn

References (45)

Publisher
Emerald Publishing
Copyright
Copyright © 2006 Emerald Group Publishing Limited. All rights reserved.
ISSN
1066-2243
DOI
10.1108/10662240610673727
Publisher site
See Article on Publisher Site

Abstract

Purpose – Content analysis of search engine user queries is an important task, since successful exploitation of the content of queries can result in the design of efficient information retrieval algorithms of search engines, which can offer custom‐tailored services to the web user. Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to address these issues. Design/methodology/approach – This study applies genetic algorithms and Dempster‐Shafer theory, proposed by He et al. , to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals and query reformulation patterns. A sample data log from the Norwegian search engine FAST (currently owned by overture) is selected to apply Dempster‐Shafer theory and genetic algorithms for identifying topic changes in the data log. Findings – As a result, 97.7 percent of topic shifts and 87.2 percent of topic continuations were estimated correctly. The findings are consistent with the previous application of the Dempster‐Shafer theory and genetic algorithms on a different search engine data log. This finding could be implied as an indication that content‐ignorant topic identification, using query patterns and time intervals, is a promising line of research. Originality/value – Studies an important dimension of user behavior in information retrieval.

Journal

Internet ResearchEmerald Publishing

Published: May 1, 2006

Keywords: Search engines; Identification; Information retrieval; Cluster analysis

There are no references for this article.