Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Classifying information sender of web documents

Classifying information sender of web documents Purpose – To develop a method for classifying information sender of web documents, which constitutes an important part of information credibility analysis. Design/methodology/approach – Machine learning approach was employed. About 2,000 human‐annotated web documents were prepared for training and evaluation. The classification model was based on support vector machine, and the features used for the classification included the title and URL of documents, as well as information of the top page. Findings – With relatively small set of features, the proposed method achieved over 50 per cent accuracy. Research limitations/implications – Some of the information sender categories were found to be more difficult to classify. This is due to the subjective nature of the categories, and further refinement of the categories is needed. Practical implications – When combined with opinion/sentiment analysis techniques, information sender classification allows more profound analysis based on interactions between opinions and senders. Such analysis forms a basis of information credibility analysis. Originality/value – This study formulated the problem of information sender classification. It proposed a method which achieves moderate performance. It also identified some of the issues related to information sender classification. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Internet Research Emerald Publishing

Classifying information sender of web documents

Internet Research , Volume 18 (2): 13 – Apr 4, 2008

Loading next page...
 
/lp/emerald-publishing/classifying-information-sender-of-web-documents-maYECChOgA

References (18)

Publisher
Emerald Publishing
Copyright
Copyright © 2008 Emerald Group Publishing Limited. All rights reserved.
ISSN
1066-2243
DOI
10.1108/10662240810862248
Publisher site
See Article on Publisher Site

Abstract

Purpose – To develop a method for classifying information sender of web documents, which constitutes an important part of information credibility analysis. Design/methodology/approach – Machine learning approach was employed. About 2,000 human‐annotated web documents were prepared for training and evaluation. The classification model was based on support vector machine, and the features used for the classification included the title and URL of documents, as well as information of the top page. Findings – With relatively small set of features, the proposed method achieved over 50 per cent accuracy. Research limitations/implications – Some of the information sender categories were found to be more difficult to classify. This is due to the subjective nature of the categories, and further refinement of the categories is needed. Practical implications – When combined with opinion/sentiment analysis techniques, information sender classification allows more profound analysis based on interactions between opinions and senders. Such analysis forms a basis of information credibility analysis. Originality/value – This study formulated the problem of information sender classification. It proposed a method which achieves moderate performance. It also identified some of the issues related to information sender classification.

Journal

Internet ResearchEmerald Publishing

Published: Apr 4, 2008

Keywords: Information management; Project management; Worldwide web; Data analysis

There are no references for this article.