TY - JOUR AU1 - Shin, Christian K. AB - Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify a document's type in the absence of domain specific models. A document type or genre can be defined by the user based primarily on layout structure. Our classification approach is based on 'visual similarity' of the layout structure by building a supervised classifier, given examples of the class. We use image features, such as the percentages of tex and non-text (graphics, image, table, and ruling) content regions, column structures, variations in the point size of fonts, the density of content area, and various statistics on features of connected components which can be derived from class samples without class knowledge. In order to obtain class labels for training samples, we conducted a user relevance test where subjects ranked UW-I document images with respect to the 12 representative images. We implemented our classification scheme using the OC1, a decision tree classifier, and report our findings. TI - Classification of document page images based on visual similarity of layout structures JF - Proceedings of SPIE DO - 10.1117/12.373493 DA - 1999-12-22 UR - https://www.deepdyve.com/lp/spie/classification-of-document-page-images-based-on-visual-similarity-of-px4aadXbyh SP - 182 EP - 190 VL - 3967 IS - 1 DP - DeepDyve ER -