Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Error correction vs. query garbling for Arabic OCR document retrieval

Error correction vs. query garbling for Arabic OCR document retrieval Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Information Systems (TOIS) Association for Computing Machinery

Error correction vs. query garbling for Arabic OCR document retrieval

Loading next page...
 
/lp/association-for-computing-machinery/error-correction-vs-query-garbling-for-arabic-ocr-document-retrieval-cf99R2xiZl

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2007 by ACM Inc.
ISSN
1046-8188
DOI
10.1145/1292591.1292596
Publisher site
See Article on Publisher Site

Abstract

Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.

Journal

ACM Transactions on Information Systems (TOIS)Association for Computing Machinery

Published: Nov 1, 2007

There are no references for this article.