Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A web crawler design for data mining

A web crawler design for data mining The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a distributed design is proposed in order to effectively use idle computing resources and to help information scientists avoid the need to employ dedicated equipment. A system developed using the model is examined and the advantages and limitations of the approach are discussed. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Information Science SAGE

A web crawler design for data mining

Journal of Information Science , Volume 27 (5): 7 – Oct 1, 2001

Loading next page...
 
/lp/sage/a-web-crawler-design-for-data-mining-rXjK3C0AIu

References (31)

Publisher
SAGE
Copyright
Copyright © by SAGE Publications
ISSN
0165-5515
eISSN
1741-6485
DOI
10.1177/016555150102700503
Publisher site
See Article on Publisher Site

Abstract

The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a distributed design is proposed in order to effectively use idle computing resources and to help information scientists avoid the need to employ dedicated equipment. A system developed using the model is examined and the advantages and limitations of the approach are discussed.

Journal

Journal of Information ScienceSAGE

Published: Oct 1, 2001

There are no references for this article.