Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Lost in migration: document quality for batch conversion to PDF/A

Lost in migration: document quality for batch conversion to PDF/A Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.Design/methodology/approachThe authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTM Document Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”FindingsBatch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.Originality/valueThese results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Library Hi Tech Emerald Publishing

Lost in migration: document quality for batch conversion to PDF/A

Library Hi Tech , Volume 39 (2): 15 – Jun 21, 2021

Loading next page...
 
/lp/emerald-publishing/lost-in-migration-document-quality-for-batch-conversion-to-pdf-a-9Ky32BQQ6g

References (19)

Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
0737-8831
DOI
10.1108/lht-10-2017-0220
Publisher site
See Article on Publisher Site

Abstract

Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.Design/methodology/approachThe authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTM Document Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”FindingsBatch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.Originality/valueThese results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.

Journal

Library Hi TechEmerald Publishing

Published: Jun 21, 2021

Keywords: Digital documents; Digital libraries; Academic libraries; Archives; Conversion; Digital preservation

There are no references for this article.