Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

IMPACT: working together to address the challenges involving mass digitization of historical printed text

IMPACT: working together to address the challenges involving mass digitization of historical... Purpose – The purpose of this paper is to address the most urgent challenges that libraries face in the mass digitization of historical printed text: the unsatisfactory result of the conversion of scanned images to full featured electronic text by means of automated optical character recognition (OCR); the historical language barrier around 1850, caused by inadequacy of most existing lexica for historical language for OCR or post‐correction and a lack of institutional knowledge and expertise in libraries, museums and archives. Design/methodology/approach – In the EC‐funded project IMPACT (Improving Access to Text), seven libraries, six research institutes and two private sector companies across Europe work together to address the challenges by the development of OCR software and technologies which exceed the accurateness of current state‐of‐the‐art software significantly. The IMPACT solutions focus on the entire process of recognition after the document leaves the scanner: Image processing, OCR processing (including use of dictionaries), OCR correction and Document formatting. IMPACT will also build capacity in mass digitization by sharing best practice and expertise with the cultural heritage communities in Europe. Findings – Technical results will include toolkits for image enhancement and segmentation, an adaptive OCR engine and several prototypes of experimental OCR engines, computational lexica and several post‐correction modules including a web based collaborative correction system and a parser for structural metadata. Strategic tools include several decision support tools, guidelines, a web site with demonstrator platform, a training programme and ultimately, a sustainable Centre of Competence for mass digitization in Europe. Originality/value – The IMPACT solutions will allow for the first time to transform large amounts of digitized historical texts into electronic text with a minimum of manual interference and a significantly improved accessibility for the user. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png OCLC Systems & Services Emerald Publishing

IMPACT: working together to address the challenges involving mass digitization of historical printed text

OCLC Systems & Services , Volume 25 (4): 16 – Oct 30, 2009

Loading next page...
 
/lp/emerald-publishing/impact-working-together-to-address-the-challenges-involving-mass-PzbJwRBVqc

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Emerald Publishing
Copyright
Copyright © 2009 Emerald Group Publishing Limited. All rights reserved.
ISSN
1065-075X
DOI
10.1108/10650750911001824
Publisher site
See Article on Publisher Site

Abstract

Purpose – The purpose of this paper is to address the most urgent challenges that libraries face in the mass digitization of historical printed text: the unsatisfactory result of the conversion of scanned images to full featured electronic text by means of automated optical character recognition (OCR); the historical language barrier around 1850, caused by inadequacy of most existing lexica for historical language for OCR or post‐correction and a lack of institutional knowledge and expertise in libraries, museums and archives. Design/methodology/approach – In the EC‐funded project IMPACT (Improving Access to Text), seven libraries, six research institutes and two private sector companies across Europe work together to address the challenges by the development of OCR software and technologies which exceed the accurateness of current state‐of‐the‐art software significantly. The IMPACT solutions focus on the entire process of recognition after the document leaves the scanner: Image processing, OCR processing (including use of dictionaries), OCR correction and Document formatting. IMPACT will also build capacity in mass digitization by sharing best practice and expertise with the cultural heritage communities in Europe. Findings – Technical results will include toolkits for image enhancement and segmentation, an adaptive OCR engine and several prototypes of experimental OCR engines, computational lexica and several post‐correction modules including a web based collaborative correction system and a parser for structural metadata. Strategic tools include several decision support tools, guidelines, a web site with demonstrator platform, a training programme and ultimately, a sustainable Centre of Competence for mass digitization in Europe. Originality/value – The IMPACT solutions will allow for the first time to transform large amounts of digitized historical texts into electronic text with a minimum of manual interference and a significantly improved accessibility for the user.

Journal

OCLC Systems & ServicesEmerald Publishing

Published: Oct 30, 2009

Keywords: Digital storage; Europe; Language; Image processing

There are no references for this article.