IEPAD: information extraction based on pattern discovery

IEPAD: information extraction based on pattern discovery IEPAD: Information Extraction Based on Pattern Discovery Chia-Hui Chang Dept. of Computer Science and Information Engineering Shao-Chen Lui Dept. of Computer Science and Information Engineering National Central University, Chung-Li, Taiwan 320 Tel: +886-3-4227151x4523 National Central University, Chung-Li, Taiwan 320 anyway@db.csie.ncu.edu.tw Fortunately, researchers have built tools that can generate wrappers automatically. For example, WIEN [11], Softmealy [7], Stalker [13] etc. are three famous works in this field. Similar to scanner or parser generator for compilers where users provide the syntax grammar and get the transition tables for scanner or parser drivers, these wrapper construction systems actually output extraction rules from training examples provided by the designer of the wrapper. The common idea involved is the machine learning techniques to summarize extraction rules, while the difference is the extractor architectures presumed in each system. For example, the single-pass, LR structure in WIEN and the multi-pass, hierarchical structure in Stalker. Nevertheless, the designer must manually label the beginning and the end of the training examples for generating the rules. Manual labeling, in general, is time-consuming and not efficient enough. Recently, researchers are exploring new approaches to fully automate wrapper construction. That is, without users ™ training examples. For example, Embley et http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

IEPAD: information extraction based on pattern discovery

Association for Computing Machinery — Apr 1, 2001

Loading next page...
/lp/association-for-computing-machinery/iepad-information-extraction-based-on-pattern-discovery-78jgHCll3G
Datasource
Association for Computing Machinery
Copyright
Copyright © 2001 by ACM Inc.
ISBN
1-58113-348-0
doi
10.1145/371920.372182
Publisher site
See Article on Publisher Site

Abstract

IEPAD: Information Extraction Based on Pattern Discovery Chia-Hui Chang Dept. of Computer Science and Information Engineering Shao-Chen Lui Dept. of Computer Science and Information Engineering National Central University, Chung-Li, Taiwan 320 Tel: +886-3-4227151x4523 National Central University, Chung-Li, Taiwan 320 anyway@db.csie.ncu.edu.tw Fortunately, researchers have built tools that can generate wrappers automatically. For example, WIEN [11], Softmealy [7], Stalker [13] etc. are three famous works in this field. Similar to scanner or parser generator for compilers where users provide the syntax grammar and get the transition tables for scanner or parser drivers, these wrapper construction systems actually output extraction rules from training examples provided by the designer of the wrapper. The common idea involved is the machine learning techniques to summarize extraction rules, while the difference is the extractor architectures presumed in each system. For example, the single-pass, LR structure in WIEN and the multi-pass, hierarchical structure in Stalker. Nevertheless, the designer must manually label the beginning and the end of the training examples for generating the rules. Manual labeling, in general, is time-consuming and not efficient enough. Recently, researchers are exploring new approaches to fully automate wrapper construction. That is, without users ™ training examples. For example, Embley et

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off