Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Optimisation of archival processes involving digitisation of typewritten documents

Optimisation of archival processes involving digitisation of typewritten documents The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.Design/methodology/approachThe typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.FindingsA series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.Originality/valueBased on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Aslib Journal of Information Management Emerald Publishing

Optimisation of archival processes involving digitisation of typewritten documents

Loading next page...
 
/lp/emerald-publishing/optimisation-of-archival-processes-involving-digitisation-of-JcWEk0Y8HG
Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
2050-3806
DOI
10.1108/ajim-11-2019-0326
Publisher site
See Article on Publisher Site

Abstract

The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.Design/methodology/approachThe typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.FindingsA series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.Originality/valueBased on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.

Journal

Aslib Journal of Information ManagementEmerald Publishing

Published: Nov 12, 2020

Keywords: Digitisation; Optical character recognition; Resolution; Binarisation; Typewritten documents; Archival materials; Cultural heritage

References