J Supercomput https://doi.org/10.1007/s11227-018-2444-0 Improving personal information detection using OCR feature recognition rate 1 1 1 YoungKyung Lee · Jinho Song · Yoojae Won © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract With the recent advancements in information and communication technolo- gies, the creation and storage of documents has become digitalized. Therefore, many documents are stored on computers. Documents containing personal information can be leaked by internal or external malicious acts, and the problem of information loss for individuals and corporations is gradually increasing. This paper proposes a method to more efﬁciently and quickly identify the existence of personal information among documents stored in image ﬁles on personal and corporate computers to prevent their leakage in advance. We improved the efﬁciency of personal information detection by classifying optical character recognition (OCR) features by recognition rate and delet- ing redundant ones to increase detection speed. In addition, the detection time was reduced using the reference frequency of the classiﬁed OCR features. Experiments conﬁrm an improvement in the performance of the proposed method compared with that of the existing system. Keywords Character recognition · Personal information · Image ﬁles · Regular expression · Text learning · Data classiﬁcation ·
The Journal of Supercomputing – Springer Journals
Published: May 29, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud