Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A strategy for extracting information from semi‐structured web pages

A strategy for extracting information from semi‐structured web pages Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi‐structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub‐attributes that describe the extracted attributes and values of the sub‐attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

A strategy for extracting information from semi‐structured web pages

Loading next page...
 
/lp/emerald-publishing/a-strategy-for-extracting-information-from-semi-structured-web-pages-OSD1RPh2U0
Publisher
Emerald Publishing
Copyright
Copyright © 2010 Emerald Group Publishing Limited. All rights reserved.
ISSN
1744-0084
DOI
10.1108/17440081011090239
Publisher site
See Article on Publisher Site

Abstract

Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi‐structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub‐attributes that describe the extracted attributes and values of the sub‐attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Nov 23, 2010

Keywords: Internet; Data handling; Information retrieval

References