Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A hidden Markov model‐based approach for extracting information from web news

A hidden Markov model‐based approach for extracting information from web news Purpose – This paper aims to present a method based on hidden Markov models (HMM) for extracting information from web news. Design/methodology/approach – The samples under study are derived from the contents of PROC “People's Daily Online,” a web‐based news publication containing non‐structured archives. This study focuses on developing HMM‐based tools for news filtering in order to retrieve terms of interest, such as “Geo‐location,” “System,” and “Personas.” The experiments are performed in two stages. In the first stage, each HMM being built is exclusively serving for extracting unique target term in order to evaluate the fundamental information extraction (IE) capability. In the second stage, the experiment is then extended to resolve a more complex, multi‐term extraction issue. Findings – The results reveal that, by using HMMs as a basis, the accuracies ( F ‐measure) for unique IE tasks can achieve more than 70 per cent on average, while no fewer than 66 per cent accuracies are obtained for multi‐term extraction. Originality/value – The study reveals the promising of using HMM for developing automatic tool in filtering free‐structured data. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

A hidden Markov model‐based approach for extracting information from web news

International Journal of Web Information Systems , Volume 3 (1/2): 12 – Sep 28, 2007

Loading next page...
 
/lp/emerald-publishing/a-hidden-markov-model-based-approach-for-extracting-information-from-Ii6z09VlQc

References (22)

Publisher
Emerald Publishing
Copyright
Copyright © 2007 Emerald Group Publishing Limited. All rights reserved.
ISSN
1744-0084
DOI
10.1108/17440080710829243
Publisher site
See Article on Publisher Site

Abstract

Purpose – This paper aims to present a method based on hidden Markov models (HMM) for extracting information from web news. Design/methodology/approach – The samples under study are derived from the contents of PROC “People's Daily Online,” a web‐based news publication containing non‐structured archives. This study focuses on developing HMM‐based tools for news filtering in order to retrieve terms of interest, such as “Geo‐location,” “System,” and “Personas.” The experiments are performed in two stages. In the first stage, each HMM being built is exclusively serving for extracting unique target term in order to evaluate the fundamental information extraction (IE) capability. In the second stage, the experiment is then extended to resolve a more complex, multi‐term extraction issue. Findings – The results reveal that, by using HMMs as a basis, the accuracies ( F ‐measure) for unique IE tasks can achieve more than 70 per cent on average, while no fewer than 66 per cent accuracies are obtained for multi‐term extraction. Originality/value – The study reveals the promising of using HMM for developing automatic tool in filtering free‐structured data.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Sep 28, 2007

Keywords: Markov processes; Archiving; Computer communications software

There are no references for this article.