Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images

An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and... Purpose – The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved pattern matching (PM) technique. Design/methodology/approach – In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi‐symbol QM‐coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness. Findings – Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6‐3 times in the lossy case and 1.3‐2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images. Research limitations/implications – Only the binary printed typeset textual images are considered. Practical implications – The proposed method has a high‐compression ratio for archiving and storage applications. Originality/value – To the authors' best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run‐length and arithmetic coding techniques has not so far been used. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Intelligent Computing and Cybernetics Emerald Publishing

An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images

Loading next page...
 
/lp/emerald-publishing/an-improved-pattern-matching-technique-for-lossy-lossless-compression-3Ot8fWfKM6
Publisher
Emerald Publishing
Copyright
Copyright © 2009 Emerald Group Publishing Limited. All rights reserved.
ISSN
1756-378X
DOI
10.1108/17563780910939273
Publisher site
See Article on Publisher Site

Abstract

Purpose – The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved pattern matching (PM) technique. Design/methodology/approach – In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi‐symbol QM‐coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness. Findings – Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6‐3 times in the lossy case and 1.3‐2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images. Research limitations/implications – Only the binary printed typeset textual images are considered. Practical implications – The proposed method has a high‐compression ratio for archiving and storage applications. Originality/value – To the authors' best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run‐length and arithmetic coding techniques has not so far been used.

Journal

International Journal of Intelligent Computing and CyberneticsEmerald Publishing

Published: Mar 27, 2009

Keywords: Libraries; Languages

References