Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Intelligent Recognition of Time Stamp Characters in Solar Scanned Images from Film

Intelligent Recognition of Time Stamp Characters in Solar Scanned Images from Film Hindawi Advances in Astronomy Volume 2019, Article ID 6565379, 9 pages https://doi.org/10.1155/2019/6565379 Research Article Intelligent Recognition of Time Stamp Characters in Solar Scanned Images from Film 1 1 1 1 2 Jiafeng Zhang , Guangzhong Lin, Shuguang Zeng , Sheng Zheng, Xiao Yang, 2 1 3 Ganghua Lin, Xiangyun Zeng, and Haimin Wang College of Science, China ree Gorges University, Yichang 443002, China Key Laboratory of Solar Activity, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China Institute for Space Weather Sciences, New Jersey Institute of Technology, 323 Martin Luther King Boulevard, Newark, NJ 07102-1982, USA Correspondence should be addressed to Shuguang Zeng; zengshuguang19@163.com Received 11 April 2019; Revised 4 July 2019; Accepted 18 July 2019; Published 28 August 2019 Academic Editor: Geza Kovacs Copyright © 2019 Jiafeng Zhang et al. ,is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Prior to the availability of digital cameras, the solar observational images are typically recorded on films, and the information such as date and time were stamped in the same frames on film. It is significant to extract the time stamp information on the film so that the researchers can efficiently use the image data. ,is paper introduces an intelligent method for extracting time stamp in- formation, namely, the convolutional neural network (CNN), which is an algorithm in deep learning of multilayer neural network structures and can identify time stamp character in the scanned solar images. We carry out the time stamp decoding for the digitized data from the National Solar Observatory from 1963 to 2003. ,e experimental results show that the method is accurate and quick for this application. We finish the time stamp information extraction for more than 7 million images with the accuracy of 98%. and homologous jets [4]. Because of the huge amount of 1. Introduction data, the time stamps of many digitized chromospheric ,e chromosphere is a layer of atmosphere between the images are still in the form that cannot be read directly by the photosphere and the corona. ,e chromospheric magnetic computer, which has produced obstacles to further research. field structure is high dynamic, and the most intensive ,e digitization of time stamps makes the data to be more activities are solar flares. In order to study the solar flares and efficiently analyzed. ,erefore, time stamp decoding is a other solar activities, it is necessary to accumulate the flare significant problem that we intend to solve. observations in the chromosphere for many years. ,ere- From 1963 to 2003, full-disk Halpha images are recorded in 35 mm films with 1 minute or even shorter cadence at fore, a number of solar telescopes have been established around the world, for example, Solar Magnetic Field Tele- National Solar Observatory (NSO) of the US. More than 8 scope (SMFT) in Huairou [1], China, and McMath-Pierce million pictures have been recorded and then digitized by Solar Telescope in Arizona [2], USA. Prior to the availability the New Jersey Institute of Technology (NJIT), which covers of modern digital cameras, the main medium for recording hundreds of solar flares and other activities. It will create a solar chromosphere data was film. In order to use rich valuable data archive of solar eruptions, which is a huge historical data, many projects are involved to digitize his- advance in solar astronomy. However, the data is useless torical astronomical data, and new research results are before the decoding of time stamp. An example of a obtained from old data such as observation of a Moreton chromospheric image is shown in Figure 1. ,e image wave and wave-filament interactions associated with the records some information such as the year, month, day, renowned X9 flare on 1990 May 24 [3], circular ribbon flares, hour, minute, second, and film number, besides full-disk 2 Advances in Astronomy Second Input layer Convolution Pooling layer Fully connected Output layer Minute layer layer Figure 2: Convolutional neural network structure, with input Hour layer, convolution layer, pooling layer, fully connected layer, and output layer. Year Month Day layer by the convolutional layer, the pooling layer, and the fully connected layer and then used in classifying the input data by logistic regression. Multiple convolutional layers, pooling layers, and fully connected layers are possible in the CNN. ,e convolution Figure 1: Full-disk chromosphere image obtained in Halpha by layer detects the characteristics of the input layer to the NSO. ,e time stamp is on the left side of the image, and the full- maximum extent by randomly generating sufficient con- disk chromosphere image in the middle. volution kernels. A large number of feature maps are generated after passing through the convolutional layer. ,e convolution layer is usually followed by an activation solar image. ,e time/date when the picture was taken is function which is used to convert features from a linear what we need to extract. As the amount of data is very large, space into a nonlinear space to achieve the nonlinear automatically identifying the characters of the time stamp is classification [16]. ReLu, sigmod, and tanh are commonly the key to the efficient usage of the data. In order to solve the applied as the activation functions. In this paper, ReLu is problem of character recognition, many methods have been adopted, which can effectively prevent overfitting problems. proposed, such as support vector machine algorithm [5], ,e pooling layer is a feature filter for the convolutional layer deep learning algorithm [6–8], and so on. to preserve the main features and to reduce the amount of Recently, the convolutional neural network [9, 10] computation. It is often placed in the middle of two con- (CNN) is a popular deep learning algorithm with high ac- volutional layers. curacy in classification. It has been widely used in face ,e data, which processed by multiple convolution recognition [11], image classification [9], speech recognition layers and pooling layers, are connected to one or more fully [12], character recognition [13, 14], etc. Zheng et al. [13] connected layers. In a fully connected layer, each neuron is applied it for character recognition in the sunspot drawings connected to all neurons in the upper layer to combine the of Yunnan Observatory, with the accuracy of 98.5%. features extracted previously, so the extracted features can be Goodfellow et al. [14] has applied CNN to the Street View completely preserved and unaffected by the position in the House Numbers (SVHNs) dataset with the accuracy of 96%. original image. ,e output value of the output layers is We adopt the CNN for character recognition because of the classified by logistic regression. Softmax regression is usually high accuracy. ,e selection effect of samples is the key to the used when dealing with multiclassification problems. ,e recognition accuracy of the CNN. However, the characters Softmax regression outputs the probability value of the in the time stamp are specific and not included in any digital sample for each class and selects the class corresponding to sample database. We need to create a sample database for the maximum probability as the recognition result of the them as a training set. In addition, many images are am- sample. In addition, the recognition accuracy of CNN is biguous, and there is still a big hindrance to solving character closely related to the quality and quantity of samples. ,e segmentation and recognition. richer the training samples, the higher the recognition In this paper, we present an intelligent recognition accuracy. method for automatic segmentation and recognition of characters based on CNN. ,e paper is organized as follows. Section 2 is an introduction to CNN algorithm. In Section 3, 3. Time Stamp Character Recognition we apply the CNN algorithm to time stamp recognition. Based on CNN Section 4 demonstrates the recognition result of this method for the time stamp. Finally, we give a conclusion in Section 5. ,e information of year, month, day, hour, and minute is what we need to extract in the image. Figures 3 and 4 show chromospheric pictures with two types of the time stamp. 2. Convolutional Neural Network ,e time stamp in Figure 3 is black on white, while Figure 4 ,e CNN [9, 10, 15] includes input layer, convolutional is white on black. ,e time stamps are uneven, the format layers, pooling layers, fully connected layers, and output and color of the characters are inconsistent, the YMD (year, layer. A typical structure is shown in Figure 2. Feature month, and day) characters are small, and the characters in vectors in the outer layer are extracted from data in the input many pictures are illegible and difficult to recognize. Advances in Astronomy 3 Figure 3: Black character image. Figure 4: White character image. characters, to allow black characters in the original However, the date information is continuous, and there are image to be extracted. ,is ensures data consistency so many images on the same date. So, we only need to get the that the next steps are as identical as possible. After step date of the first picture every day without intelligent rec- 5 as shown in Figure 6(d). If there are still no character ognition. ,at part was achieved manually. ,e CNN is used regions after the image is reversed, it means that there in identifying the HM (hour and minute) characters. ,e are no characters in the current picture. Because there flow chart of the CNN algorithm for recognizing time stamp are only two forms of time stamp and few a part of the characters is shown in Figure 5. It consists of two in- images that do not contain time stamps, the time stamp dependent parts: one for character segmentation (Section characters cannot be extracted from these images 3.1) and the other for character recognition by CNN (Section during the above process. 3.2). ,e left part of the flow chart introduces image seg- Step 8. Extract the corresponding region from the mentation, and the right part introduces character recog- original image according to the binary image, and nition. ,e input image is processed by white characters by resize each of the characters to 28 × 28 (Figure 6(f)). default. If no character areas can be extracted, the process returns to the binarization. Reverse the color of white and black in the binary image; ,e CNN will be retrained when it 3.2. Characters Recognition. ,e CNN model for time stamp has a low recognition rate for the test samples. character recognition consists of two convolutional layers, two pooling layers, and a fully connected layer (Figure 7). In the first convolutional layer Con_1, 6 different convolutional 3.1. Characters Segmentation. ,e size of the original image kernels of size 5 × 5 are used to take convolution operation is 1600 × 2048, as shown in Figure 3. ,e time stamp is on on character pictures with the size of 28 × 28. After Con_1, the left or right side of the picture and the character format is the original character picture becomes a 24 × 24 × 6 feature different. Characters are divided into two categories, one is map. ,e first pooling layer Pool_1 filters the feature map black and the other is white, which need to be dealt with using maximum pooling function with the sliding window separately. ,e character segmentation steps are as follows. of 2 × 2. ,en, it becomes a feature map of size 12 ×12 × 6. ,e convolutional layer Con_2 contains 10 kernels of 5 × 5. Step 1. Remove the part of the solar disk from the ,e pooling layer Pool_2 does the same as Pool_1. ,ese picture and obtain the left and right sides of the picture. feature maps are taken as the inputs into the fully connected Step 2. Get the picture with a time stamp based on the layer (F) to obtain the feature vector. Finally, the vector is intensity variance across the picture, and rotate the classified by the softmax function to obtain the recognition picture to adjust the direction of the characters result. (Figure 6(a)). ,e training steps of the CNN in this paper are divided Step 3. Eliminate noise in pictures with top hat oper- into the following three steps. ation (Figure 6(b)). Step 1. Add labels to the single-character images as Step 4. Binarize the picture by the Sauvola algorithm samples for training the network. [17]. Step 2. ,e character image is used as the X vector for Step 5. Reserve connectivity domain of which area is in the input layer, and the label of the image is used as the (500, 1000). Y vector. Step 6. Extract character regions using stroke width Step 3. ,e network is trained by forwarding propa- transform algorithm [18,19] (Figure 6(e)). gation and back propagation algorithms, and its co- Step 7. If there are no character regions, return to step 4. efficients are updated by loop iteration. A network structure with higher recognition accuracy is obtained Reverse the color of white and black in the binary image which is obtained in step 4 (Figure 6(c)) to get white in the end. 4 Advances in Astronomy Samples input Image input Removing the Adding labels solar section Positioning timestamp location Image as X Label as Y Rotating Training CNN Top hat operation Reversing the color Testing CNN Binarization of white and black Low Filter connected domains High recognition Nonexistence accuracy or not Rotating Extracting High character regions Existence Character Existence of character recognition regions or not Figure 5: Algorithm flow chart. (a) (b) (c) (d) (e) (f) Figure 6: Extract characters from Figure 3: (a) position the time stamp and rotate, (b) after top hat operation and binarization, (c) black and white inversion of binary image, (d) extract related connectivity domain, (e) extract character regions, and (f) characters extraction result. To train the CNN, we select 100,000 single-character samples per character. ,ese characters are recognizable by images of size 28 × 28, which were cut from the original humans and labeled manually. ,ere is no need to deal with images with white characters, as training samples, 10,000 time stamps unrecognizable by humans, because it is Advances in Astronomy 5 Softmax Input: Con_1: Pool_1: Con_2: Pool_2: F 28 × 28 6@24 × 24 6@12 × 12 12@8 × 8 12@4 × 4 Figure 7: Convolution neural network structure of character recognition, included a input layer, two convolution layers, two pooling layers, and a fully connected layer. impossible to verify the recognition correctness. 9000 images Table 1: Test results of CNN identification. are randomly selected as samples to train the network. ,e Total Recognition Recognition Time remaining samples are used as testing samples to test the Character numbers errors rate cost(s) recognition accuracy of the network. ,e test results are 0 1000 20 0.980 6.01 shown in Table 1. From the table, the recognition accuracy of 1 1000 4 0.996 6.11 each character is over 98%, and it takes only about 6 seconds 2 1000 8 0.992 5.96 to recognize 1000 pictures. 3 1000 3 0.997 6.28 At present, the commonly used methods for character 4 1000 1 0.999 6.24 recognition are Optical Character Recognition (OCR) [20] 5 1000 19 0.981 6.52 and character recognition based on deep neural network. It 6 1000 12 0.988 6.05 is well known that OCR recognizes standard characters 7 1000 3 0.997 6.17 effectively. So we did an experiment based on open rec- 8 1000 12 0.988 5.95 ognition engine TESSERACT [21]. We train it in the same 9 1000 13 0.987 6.11 way as CNN, and the same way to test it. ,e test results are shown in Table 2 that the highest recognition accuracy is Table 2: Test results of TESSERACT. 96.8% and the lowest is 93.2% and the lowest time cost of testing 1000 samples is 8.23 seconds. Convolutional neural Total Recognition Recognition Time Character networks, on the contrary, have higher recognition accuracy numbers errors accuracy rate cost(s) and lower time cost. ,e reason for the relatively low rec- 0 1000 57 0.943 8.95 ognition accuracy of OCR is that characters extracted from 1 1000 45 0.955 8.32 time stamps are affected by some interference, such as il- 2 1000 53 0.947 8.28 lumination interference, background noise interference, as 3 1000 50 0.950 8.84 4 1000 59 0.941 8.23 shown in Figure 8. It is hard for OCR to handle these sit- 5 1000 40 0.960 8.44 uations. So it can be concluded from the comparative ex- 6 1000 49 0.951 8.52 periments that CNN has better robustness, stronger 7 1000 32 0.968 8.39 antijamming, and lower time consumption than OCR. 8 1000 68 0.932 8.45 9 1000 60 0.940 8.69 3.3. Date Check. After the hour and minute in the time stamp are identified, another important step is to complete the information of the date (year, month, and day). Since the date of the photo may not be continuous and cannot be filled Figure 8: Character samples affected by illumination and back- in automatically by the program, it is necessary to confirm ground noise. the date manually. Although the dates are not continuous, they are all in order, and the volume number, which is recorded in the folder name, of the film helps to determine a day need to be verified. If the date is incorrect, modify it the range of date. In addition, the photographing time is manually, and the program will automatically update all mostly continuous and the 24-hour timekeeping method is dates in the subsequent pictures. used; it is easy to judge whether the date has changed. For Fill in paths of the original image and the record table in example, if time information of the first picture is “2359” and the corresponding text box of the program. Click on the that of the second picture “000,” the date information of the “Open” button to open the first image in the folder and its second picture can be added one day based on the first date information is displayed in the corresponding text box. picture. So for images over a period of time, it only needs to Click on the “Next” or “Last” button to open the next or know the observation date of the first picture. However, previous image, respectively. Click on the “Update” button some dates are not continuous, so a manual check is re- to update the date. ,e “Next day” button is used to jump quired. So we adopt a user graphical interface (Figure 9) to directly to the next day. Finally, the updated contents are assist in the date confirmation. Only the first few pictures of saved in corresponding files. 6 Advances in Astronomy Number of image Image path Excel file G:\NSO\88 100 name G:\ 88.xlsx Excel file path Open Last Next Year Y 1968 Month M 1 Day D 24 Update Next day Figure 9: Graphic interface of date check. Table 3: Statistical table of recognition results. Correct 1 error 2 errors 3 errors 4 errors Average time cost (s) Recognition accuracy rate Numbers 9788 202 10 0 0 0.09 97.9% Table 4: Confusion matrix of character recognition results. Character 0 1 2 3 4 5 6 7 8 9 0 4558 0 2 108 0 0 16 0 0 1 1 0 11147 0 0 0 0 0 0 0 0 2 0 0 4826 0 0 0 0 0 0 0 3 0 1 85 3890 0 1 0 0 0 0 4 0 0 0 0 4176 2 0 0 0 0 5 0 0 0 0 0 4362 0 0 0 0 6 0 0 0 0 0 0 1299 0 0 0 7 0 0 0 0 0 0 0 1918 0 0 8 0 0 0 0 0 0 0 1 1846 0 9 0 0 0 0 0 0 0 0 1 1760 Total 4558 11148 4913 3998 4176 4365 1315 1919 1847 1761 Number of errors 0 1 87 108 0 3 16 1 1 1 Recognition rate 1 0.99 0.98 0.97 1 0.99 0.99 0.99 0.99 0.99 Table 4 shows that the recognition accuracy of the 4. Result and Discussion character “0” is 100%, that of “1,” “5,” and “7” is greater than To further test the recognition accuracy of the network 99.9%, and that of other characters is above 97.3%. ,e av- under actual conditions, we randomly selected 10,000 erage recognition accuracy of all the characters is 99.5%. original images for testing. Table 3 shows the accuracy of the However, the recognition error rates of the characters “2,” “3,” testing results recognized by CNN, which is confirmed and “6” are higher, mainly due to these characters being manually. Misrecognizing 1 character occurs 202 times, affected by the light, as shown in Figure 10. When they are misrecognizing 2 characters occurs 10 times, and no situ- affected by illumination, they are easily destroyed by the local ation occurs for misrecognizing more than 3 characters binary algorithm leading to structural breaks. ,e character simultaneously. ,e recognition accuracy rate is 97.9%, and fragments are considered to be noise in the next step of the the average time taken for each picture is 0.09 seconds. ,e algorithm because of their small area, which will affect the statistics of recognition results for each character are shown recognition results (e.g., Figures 10(b) and 10(d). However, in Table 4. these images affected by lighting only account for a small part Advances in Astronomy 7 Table 5: Annual total amount of images. Year Number 1963 124469 1964 474370 1965 539969 (a) 1966 559126 1967 683452 1968 489625 1969 584851 1970 46047 (b) 1971 0 1972 16624 1973 350606 1974 289724 1975 213868 1976 182321 1977 190870 1978 207012 (c) 1979 293451 1980 302258 1981 236533 1982 188228 1983 178873 (d) 1984 174902 1985 32324 Figure 10: Original image (a, c) and segmentation result (b, d). As 1986 0 shown in figure (b, d), the third character and the fourth character 1987 113652 are not completely split in (b), the fourth character is not com- 1988 217226 pletely split in (d), resulting in incorrect recognition results. 1989 67403 1990 0 1991 109306 1992 185144 1993 121401 1994 86425 1995 98994 1996 46638 (a) 1997 82101 1998 76434 1999 37270 2000 34679 2001 68460 2002 43292 (b) 2003 12983 Total 7760911 there is lighting interference on the images, their main structures are preserved so that their recognition results are (c) not affected. ,e defective structures can be identified, which is one of the advantages of CNN. Although these images affected by lighting only account for a small part, to solve this problem, our further plan is adding some samples affected by lighting to the training set (d) and improving the algorithm of character segmentation. In total, we get date/time information for more than 7 Figure 11: Original image (a, c) and segmentation result (b, d), million pictures of 38 years, as shown in Table 5. ,e respectively. As shown in figure (b, d), the recognition result is remaining unprocessed images such as those in 1971, 1986, correct, even though the fourth character is partially split. and 1990 are due to time stamps that are beyond human of the whole samples, as shown in Table 4, so they contribute a recognition or do not have time stamps, about 10% of the little to the average recognition accuracy. Besides, the rec- total. It is not necessary to deal with these pictures because it ognition results of some characters are not affected by illu- is impossible to verify whether they are recognized correctly mination, such as “8” and “9,” as shown in Figure 11. When or not. ,e number of pictures per year is also shown in a bar 8 Advances in Astronomy ×10 Year Figure 12: Number of pictures per year. chart as shown in Figure 12. ,e number of pictures rose U1531247, 11427803, 11427901, and 11873062, the 13th slowly from 1963 to 1967, peaking in 1967 with about 700 Five-year Informatization Plan of Chinese Academy of thousand pictures. After 1967, the number of pictures de- Sciences under Grant XXH13505-04, and the Beijing Mu- clined dramatically. In 2003, there were about 13,000 nicipal Science and Technology Project under Grant pictures. Z181100002918004. Haimin Wang acknowledges the sup- port of US NSF under grant AGS-1620875. ,e authors are 5. Conclusion grateful to the National Solar Observatory for providing the original film data. In this paper, we describe an intelligent algorithm to extract the time stamp from traditional films based on CNN. ,e References experimental results show that the method has a good result and meets the speed and quality requirements for identi- [1] G. Ai and Y. Hu, “On principle of solor magnetic field fication. It also has strong portability in solving the same telescope,” Acta Astronomica Sinica, vol. 2, pp. 91–98, 1986. type of problems in similar applications. [2] https://www.noao.edu/outreach/kptour/mcmath.html. Finally, we get date/time information for more than 7 [3] R. Liu, C. Liu, Y. Xu, W. Liu, B. Kliem, and H. Wang, million pictures which are recorded by NSO of the US. ,is “Observation of a Moreton wave and wave-filament in- greatly reduces the amount of manual work, so that this teractions associated with the renowned X9 flare on 1990 May 24,” e Astrophysical Journal, vol. 773, no. 2, p. 166, 2013. batch of data can be effectively utilized by researchers as [4] H. Wang and C. Liu, “Circular ribbon flares and homologous soon as possible. ,e method proposed in this paper can be jets,” e Astrophysical Journal, vol. 760, no. 2, p. 101, 2012. applied to character recognition in other historical image, [5] D. Nasien, H. Haron, and S. S. Yuhaniz, “Support vector such as handwritten character recognition in sunspot machine (SVM) for english handwritten character recogni- drawing. tion,” in Proceedings of the Second International Conference on Computer Engineering and Applications IEEE, vol. 1, Data Availability pp. 249–252, Camastra, Italy, 2010. [6] I. H. Witten, E. Frank, M. A. Hall et al., Data Mining: Practical ,e data used to support the findings of this study are Machine Learning Tools and Techniques, Morgan Kaufmann, available from the corresponding author upon request. And Burlington, MA, USA, 2016. in the future, the data used to support the findings of this [7] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, study will be published online. vol. 521, no. 7553, pp. 436–444, 2015. [8] J. Schmidhuber, “Deep learning in neural networks: an Conflicts of Interest overview,” Neural Networks, vol. 61, pp. 85–117, 2015. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet ,e authors declare that they have no conflicts of interest. classification with deep convolutional neural networks,” in Proceedings of the International Conference on Neural In- formation Processing Systems, pp. 1097–1105, Curran Asso- Acknowledgments ciates Inc., Lake Tahoe, NV, USA, December 2012. ,is work is supported in part by the National Natural [10] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face Science Foundation of China under Grants U1731124, recognition: a convolutional neural-network approach,” IEEE Number of images 2003 Advances in Astronomy 9 Transactions on Neural Networks, vol. 8, no. 1, pp. 98–113, [11] Y. Sun, X. Wang, and X. Tang, “Deep learning face repre- sentation from predicting 10,000 classes,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898, Las Vegas Valley, NV, USA, June 2014. [12] G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [13] S. Zheng, X. Zeng, G. Lin et al., “Sunspot drawings hand- written character recognition method based on deep learn- ing,” New Astronomy, vol. 45, pp. 54–59, 2016. [14] I. J. Goodfellow, Y. Bulatov, J. Ibarz et al., “Multi-digit number recognition from street view imagery using deep convolu- tional neural networks,” 2013, https://arxiv.org/abs/1312. [15] Y. Kim, Convolutional Neural Networks for Sentence Classi- fication, 2014. [16] J. Gu, Z. Wang, J. Kuen et al., “Recent advances in con- volutional neural networks,” Pattern Recognition, vol. 77, pp. 354–377, 2018. [17] J. Sauvola and M. Pietikainen, ¨ “Adaptive document image binarization,” Pattern Recognition, vol. 33, no. 2, pp. 225–236, [18] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970, San Francisco, CA, USA, June 2010. [19] Y. Li and H. Lu, “Scene text detection via stroke width,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 681–684, IEEE, Tsukuba, Japan, November 2012. [20] K. M. Mohiuddin and J. Mao, “Optical character recognition,” in Wiley Encyclopedia of Electrical and Electronics Engineer- ing, Springer, Berlin, Germany, 1999. [21] R. Smith, “An overview of the tesseract OCR engine,” in Proceedings of the Ninth International Conference on Docu- ment Analysis and Recognition (ICDAR 2007), IEEE Com- puter Society, Curitiba, Parana, Brazil, September 2007. Journal of International Journal of The Scientific Advances in Applied Bionics Engineering Geophysics Chemistry World Journal and Biomechanics Hindawi Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Active and Passive Shock and Vibration Electronic Components Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Advances in Advances in Mathematical Physics Astronomy Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 International Journal of Rotating Machinery Advances in Optical Advances in Technologies OptoElectronics Advances in Advances in Physical Chemistry Condensed Matter Physics Hindawi Hindawi Hindawi Hindawi Volume 2018 www.hindawi.com Hindawi Volume 2018 Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com International Journal of Journal of International Journal of Advances in Antennas and Advances in Chemistry Propagation High Energy Physics Acoustics and Vibration Optics Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Advances in Astronomy Hindawi Publishing Corporation

Intelligent Recognition of Time Stamp Characters in Solar Scanned Images from Film

Loading next page...
 
/lp/hindawi-publishing-corporation/intelligent-recognition-of-time-stamp-characters-in-solar-scanned-5IrpZT8rMv
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2019 Jiafeng Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-7969
eISSN
1687-7977
DOI
10.1155/2019/6565379
Publisher site
See Article on Publisher Site

Abstract

Hindawi Advances in Astronomy Volume 2019, Article ID 6565379, 9 pages https://doi.org/10.1155/2019/6565379 Research Article Intelligent Recognition of Time Stamp Characters in Solar Scanned Images from Film 1 1 1 1 2 Jiafeng Zhang , Guangzhong Lin, Shuguang Zeng , Sheng Zheng, Xiao Yang, 2 1 3 Ganghua Lin, Xiangyun Zeng, and Haimin Wang College of Science, China ree Gorges University, Yichang 443002, China Key Laboratory of Solar Activity, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China Institute for Space Weather Sciences, New Jersey Institute of Technology, 323 Martin Luther King Boulevard, Newark, NJ 07102-1982, USA Correspondence should be addressed to Shuguang Zeng; zengshuguang19@163.com Received 11 April 2019; Revised 4 July 2019; Accepted 18 July 2019; Published 28 August 2019 Academic Editor: Geza Kovacs Copyright © 2019 Jiafeng Zhang et al. ,is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Prior to the availability of digital cameras, the solar observational images are typically recorded on films, and the information such as date and time were stamped in the same frames on film. It is significant to extract the time stamp information on the film so that the researchers can efficiently use the image data. ,is paper introduces an intelligent method for extracting time stamp in- formation, namely, the convolutional neural network (CNN), which is an algorithm in deep learning of multilayer neural network structures and can identify time stamp character in the scanned solar images. We carry out the time stamp decoding for the digitized data from the National Solar Observatory from 1963 to 2003. ,e experimental results show that the method is accurate and quick for this application. We finish the time stamp information extraction for more than 7 million images with the accuracy of 98%. and homologous jets [4]. Because of the huge amount of 1. Introduction data, the time stamps of many digitized chromospheric ,e chromosphere is a layer of atmosphere between the images are still in the form that cannot be read directly by the photosphere and the corona. ,e chromospheric magnetic computer, which has produced obstacles to further research. field structure is high dynamic, and the most intensive ,e digitization of time stamps makes the data to be more activities are solar flares. In order to study the solar flares and efficiently analyzed. ,erefore, time stamp decoding is a other solar activities, it is necessary to accumulate the flare significant problem that we intend to solve. observations in the chromosphere for many years. ,ere- From 1963 to 2003, full-disk Halpha images are recorded in 35 mm films with 1 minute or even shorter cadence at fore, a number of solar telescopes have been established around the world, for example, Solar Magnetic Field Tele- National Solar Observatory (NSO) of the US. More than 8 scope (SMFT) in Huairou [1], China, and McMath-Pierce million pictures have been recorded and then digitized by Solar Telescope in Arizona [2], USA. Prior to the availability the New Jersey Institute of Technology (NJIT), which covers of modern digital cameras, the main medium for recording hundreds of solar flares and other activities. It will create a solar chromosphere data was film. In order to use rich valuable data archive of solar eruptions, which is a huge historical data, many projects are involved to digitize his- advance in solar astronomy. However, the data is useless torical astronomical data, and new research results are before the decoding of time stamp. An example of a obtained from old data such as observation of a Moreton chromospheric image is shown in Figure 1. ,e image wave and wave-filament interactions associated with the records some information such as the year, month, day, renowned X9 flare on 1990 May 24 [3], circular ribbon flares, hour, minute, second, and film number, besides full-disk 2 Advances in Astronomy Second Input layer Convolution Pooling layer Fully connected Output layer Minute layer layer Figure 2: Convolutional neural network structure, with input Hour layer, convolution layer, pooling layer, fully connected layer, and output layer. Year Month Day layer by the convolutional layer, the pooling layer, and the fully connected layer and then used in classifying the input data by logistic regression. Multiple convolutional layers, pooling layers, and fully connected layers are possible in the CNN. ,e convolution Figure 1: Full-disk chromosphere image obtained in Halpha by layer detects the characteristics of the input layer to the NSO. ,e time stamp is on the left side of the image, and the full- maximum extent by randomly generating sufficient con- disk chromosphere image in the middle. volution kernels. A large number of feature maps are generated after passing through the convolutional layer. ,e convolution layer is usually followed by an activation solar image. ,e time/date when the picture was taken is function which is used to convert features from a linear what we need to extract. As the amount of data is very large, space into a nonlinear space to achieve the nonlinear automatically identifying the characters of the time stamp is classification [16]. ReLu, sigmod, and tanh are commonly the key to the efficient usage of the data. In order to solve the applied as the activation functions. In this paper, ReLu is problem of character recognition, many methods have been adopted, which can effectively prevent overfitting problems. proposed, such as support vector machine algorithm [5], ,e pooling layer is a feature filter for the convolutional layer deep learning algorithm [6–8], and so on. to preserve the main features and to reduce the amount of Recently, the convolutional neural network [9, 10] computation. It is often placed in the middle of two con- (CNN) is a popular deep learning algorithm with high ac- volutional layers. curacy in classification. It has been widely used in face ,e data, which processed by multiple convolution recognition [11], image classification [9], speech recognition layers and pooling layers, are connected to one or more fully [12], character recognition [13, 14], etc. Zheng et al. [13] connected layers. In a fully connected layer, each neuron is applied it for character recognition in the sunspot drawings connected to all neurons in the upper layer to combine the of Yunnan Observatory, with the accuracy of 98.5%. features extracted previously, so the extracted features can be Goodfellow et al. [14] has applied CNN to the Street View completely preserved and unaffected by the position in the House Numbers (SVHNs) dataset with the accuracy of 96%. original image. ,e output value of the output layers is We adopt the CNN for character recognition because of the classified by logistic regression. Softmax regression is usually high accuracy. ,e selection effect of samples is the key to the used when dealing with multiclassification problems. ,e recognition accuracy of the CNN. However, the characters Softmax regression outputs the probability value of the in the time stamp are specific and not included in any digital sample for each class and selects the class corresponding to sample database. We need to create a sample database for the maximum probability as the recognition result of the them as a training set. In addition, many images are am- sample. In addition, the recognition accuracy of CNN is biguous, and there is still a big hindrance to solving character closely related to the quality and quantity of samples. ,e segmentation and recognition. richer the training samples, the higher the recognition In this paper, we present an intelligent recognition accuracy. method for automatic segmentation and recognition of characters based on CNN. ,e paper is organized as follows. Section 2 is an introduction to CNN algorithm. In Section 3, 3. Time Stamp Character Recognition we apply the CNN algorithm to time stamp recognition. Based on CNN Section 4 demonstrates the recognition result of this method for the time stamp. Finally, we give a conclusion in Section 5. ,e information of year, month, day, hour, and minute is what we need to extract in the image. Figures 3 and 4 show chromospheric pictures with two types of the time stamp. 2. Convolutional Neural Network ,e time stamp in Figure 3 is black on white, while Figure 4 ,e CNN [9, 10, 15] includes input layer, convolutional is white on black. ,e time stamps are uneven, the format layers, pooling layers, fully connected layers, and output and color of the characters are inconsistent, the YMD (year, layer. A typical structure is shown in Figure 2. Feature month, and day) characters are small, and the characters in vectors in the outer layer are extracted from data in the input many pictures are illegible and difficult to recognize. Advances in Astronomy 3 Figure 3: Black character image. Figure 4: White character image. characters, to allow black characters in the original However, the date information is continuous, and there are image to be extracted. ,is ensures data consistency so many images on the same date. So, we only need to get the that the next steps are as identical as possible. After step date of the first picture every day without intelligent rec- 5 as shown in Figure 6(d). If there are still no character ognition. ,at part was achieved manually. ,e CNN is used regions after the image is reversed, it means that there in identifying the HM (hour and minute) characters. ,e are no characters in the current picture. Because there flow chart of the CNN algorithm for recognizing time stamp are only two forms of time stamp and few a part of the characters is shown in Figure 5. It consists of two in- images that do not contain time stamps, the time stamp dependent parts: one for character segmentation (Section characters cannot be extracted from these images 3.1) and the other for character recognition by CNN (Section during the above process. 3.2). ,e left part of the flow chart introduces image seg- Step 8. Extract the corresponding region from the mentation, and the right part introduces character recog- original image according to the binary image, and nition. ,e input image is processed by white characters by resize each of the characters to 28 × 28 (Figure 6(f)). default. If no character areas can be extracted, the process returns to the binarization. Reverse the color of white and black in the binary image; ,e CNN will be retrained when it 3.2. Characters Recognition. ,e CNN model for time stamp has a low recognition rate for the test samples. character recognition consists of two convolutional layers, two pooling layers, and a fully connected layer (Figure 7). In the first convolutional layer Con_1, 6 different convolutional 3.1. Characters Segmentation. ,e size of the original image kernels of size 5 × 5 are used to take convolution operation is 1600 × 2048, as shown in Figure 3. ,e time stamp is on on character pictures with the size of 28 × 28. After Con_1, the left or right side of the picture and the character format is the original character picture becomes a 24 × 24 × 6 feature different. Characters are divided into two categories, one is map. ,e first pooling layer Pool_1 filters the feature map black and the other is white, which need to be dealt with using maximum pooling function with the sliding window separately. ,e character segmentation steps are as follows. of 2 × 2. ,en, it becomes a feature map of size 12 ×12 × 6. ,e convolutional layer Con_2 contains 10 kernels of 5 × 5. Step 1. Remove the part of the solar disk from the ,e pooling layer Pool_2 does the same as Pool_1. ,ese picture and obtain the left and right sides of the picture. feature maps are taken as the inputs into the fully connected Step 2. Get the picture with a time stamp based on the layer (F) to obtain the feature vector. Finally, the vector is intensity variance across the picture, and rotate the classified by the softmax function to obtain the recognition picture to adjust the direction of the characters result. (Figure 6(a)). ,e training steps of the CNN in this paper are divided Step 3. Eliminate noise in pictures with top hat oper- into the following three steps. ation (Figure 6(b)). Step 1. Add labels to the single-character images as Step 4. Binarize the picture by the Sauvola algorithm samples for training the network. [17]. Step 2. ,e character image is used as the X vector for Step 5. Reserve connectivity domain of which area is in the input layer, and the label of the image is used as the (500, 1000). Y vector. Step 6. Extract character regions using stroke width Step 3. ,e network is trained by forwarding propa- transform algorithm [18,19] (Figure 6(e)). gation and back propagation algorithms, and its co- Step 7. If there are no character regions, return to step 4. efficients are updated by loop iteration. A network structure with higher recognition accuracy is obtained Reverse the color of white and black in the binary image which is obtained in step 4 (Figure 6(c)) to get white in the end. 4 Advances in Astronomy Samples input Image input Removing the Adding labels solar section Positioning timestamp location Image as X Label as Y Rotating Training CNN Top hat operation Reversing the color Testing CNN Binarization of white and black Low Filter connected domains High recognition Nonexistence accuracy or not Rotating Extracting High character regions Existence Character Existence of character recognition regions or not Figure 5: Algorithm flow chart. (a) (b) (c) (d) (e) (f) Figure 6: Extract characters from Figure 3: (a) position the time stamp and rotate, (b) after top hat operation and binarization, (c) black and white inversion of binary image, (d) extract related connectivity domain, (e) extract character regions, and (f) characters extraction result. To train the CNN, we select 100,000 single-character samples per character. ,ese characters are recognizable by images of size 28 × 28, which were cut from the original humans and labeled manually. ,ere is no need to deal with images with white characters, as training samples, 10,000 time stamps unrecognizable by humans, because it is Advances in Astronomy 5 Softmax Input: Con_1: Pool_1: Con_2: Pool_2: F 28 × 28 6@24 × 24 6@12 × 12 12@8 × 8 12@4 × 4 Figure 7: Convolution neural network structure of character recognition, included a input layer, two convolution layers, two pooling layers, and a fully connected layer. impossible to verify the recognition correctness. 9000 images Table 1: Test results of CNN identification. are randomly selected as samples to train the network. ,e Total Recognition Recognition Time remaining samples are used as testing samples to test the Character numbers errors rate cost(s) recognition accuracy of the network. ,e test results are 0 1000 20 0.980 6.01 shown in Table 1. From the table, the recognition accuracy of 1 1000 4 0.996 6.11 each character is over 98%, and it takes only about 6 seconds 2 1000 8 0.992 5.96 to recognize 1000 pictures. 3 1000 3 0.997 6.28 At present, the commonly used methods for character 4 1000 1 0.999 6.24 recognition are Optical Character Recognition (OCR) [20] 5 1000 19 0.981 6.52 and character recognition based on deep neural network. It 6 1000 12 0.988 6.05 is well known that OCR recognizes standard characters 7 1000 3 0.997 6.17 effectively. So we did an experiment based on open rec- 8 1000 12 0.988 5.95 ognition engine TESSERACT [21]. We train it in the same 9 1000 13 0.987 6.11 way as CNN, and the same way to test it. ,e test results are shown in Table 2 that the highest recognition accuracy is Table 2: Test results of TESSERACT. 96.8% and the lowest is 93.2% and the lowest time cost of testing 1000 samples is 8.23 seconds. Convolutional neural Total Recognition Recognition Time Character networks, on the contrary, have higher recognition accuracy numbers errors accuracy rate cost(s) and lower time cost. ,e reason for the relatively low rec- 0 1000 57 0.943 8.95 ognition accuracy of OCR is that characters extracted from 1 1000 45 0.955 8.32 time stamps are affected by some interference, such as il- 2 1000 53 0.947 8.28 lumination interference, background noise interference, as 3 1000 50 0.950 8.84 4 1000 59 0.941 8.23 shown in Figure 8. It is hard for OCR to handle these sit- 5 1000 40 0.960 8.44 uations. So it can be concluded from the comparative ex- 6 1000 49 0.951 8.52 periments that CNN has better robustness, stronger 7 1000 32 0.968 8.39 antijamming, and lower time consumption than OCR. 8 1000 68 0.932 8.45 9 1000 60 0.940 8.69 3.3. Date Check. After the hour and minute in the time stamp are identified, another important step is to complete the information of the date (year, month, and day). Since the date of the photo may not be continuous and cannot be filled Figure 8: Character samples affected by illumination and back- in automatically by the program, it is necessary to confirm ground noise. the date manually. Although the dates are not continuous, they are all in order, and the volume number, which is recorded in the folder name, of the film helps to determine a day need to be verified. If the date is incorrect, modify it the range of date. In addition, the photographing time is manually, and the program will automatically update all mostly continuous and the 24-hour timekeeping method is dates in the subsequent pictures. used; it is easy to judge whether the date has changed. For Fill in paths of the original image and the record table in example, if time information of the first picture is “2359” and the corresponding text box of the program. Click on the that of the second picture “000,” the date information of the “Open” button to open the first image in the folder and its second picture can be added one day based on the first date information is displayed in the corresponding text box. picture. So for images over a period of time, it only needs to Click on the “Next” or “Last” button to open the next or know the observation date of the first picture. However, previous image, respectively. Click on the “Update” button some dates are not continuous, so a manual check is re- to update the date. ,e “Next day” button is used to jump quired. So we adopt a user graphical interface (Figure 9) to directly to the next day. Finally, the updated contents are assist in the date confirmation. Only the first few pictures of saved in corresponding files. 6 Advances in Astronomy Number of image Image path Excel file G:\NSO\88 100 name G:\ 88.xlsx Excel file path Open Last Next Year Y 1968 Month M 1 Day D 24 Update Next day Figure 9: Graphic interface of date check. Table 3: Statistical table of recognition results. Correct 1 error 2 errors 3 errors 4 errors Average time cost (s) Recognition accuracy rate Numbers 9788 202 10 0 0 0.09 97.9% Table 4: Confusion matrix of character recognition results. Character 0 1 2 3 4 5 6 7 8 9 0 4558 0 2 108 0 0 16 0 0 1 1 0 11147 0 0 0 0 0 0 0 0 2 0 0 4826 0 0 0 0 0 0 0 3 0 1 85 3890 0 1 0 0 0 0 4 0 0 0 0 4176 2 0 0 0 0 5 0 0 0 0 0 4362 0 0 0 0 6 0 0 0 0 0 0 1299 0 0 0 7 0 0 0 0 0 0 0 1918 0 0 8 0 0 0 0 0 0 0 1 1846 0 9 0 0 0 0 0 0 0 0 1 1760 Total 4558 11148 4913 3998 4176 4365 1315 1919 1847 1761 Number of errors 0 1 87 108 0 3 16 1 1 1 Recognition rate 1 0.99 0.98 0.97 1 0.99 0.99 0.99 0.99 0.99 Table 4 shows that the recognition accuracy of the 4. Result and Discussion character “0” is 100%, that of “1,” “5,” and “7” is greater than To further test the recognition accuracy of the network 99.9%, and that of other characters is above 97.3%. ,e av- under actual conditions, we randomly selected 10,000 erage recognition accuracy of all the characters is 99.5%. original images for testing. Table 3 shows the accuracy of the However, the recognition error rates of the characters “2,” “3,” testing results recognized by CNN, which is confirmed and “6” are higher, mainly due to these characters being manually. Misrecognizing 1 character occurs 202 times, affected by the light, as shown in Figure 10. When they are misrecognizing 2 characters occurs 10 times, and no situ- affected by illumination, they are easily destroyed by the local ation occurs for misrecognizing more than 3 characters binary algorithm leading to structural breaks. ,e character simultaneously. ,e recognition accuracy rate is 97.9%, and fragments are considered to be noise in the next step of the the average time taken for each picture is 0.09 seconds. ,e algorithm because of their small area, which will affect the statistics of recognition results for each character are shown recognition results (e.g., Figures 10(b) and 10(d). However, in Table 4. these images affected by lighting only account for a small part Advances in Astronomy 7 Table 5: Annual total amount of images. Year Number 1963 124469 1964 474370 1965 539969 (a) 1966 559126 1967 683452 1968 489625 1969 584851 1970 46047 (b) 1971 0 1972 16624 1973 350606 1974 289724 1975 213868 1976 182321 1977 190870 1978 207012 (c) 1979 293451 1980 302258 1981 236533 1982 188228 1983 178873 (d) 1984 174902 1985 32324 Figure 10: Original image (a, c) and segmentation result (b, d). As 1986 0 shown in figure (b, d), the third character and the fourth character 1987 113652 are not completely split in (b), the fourth character is not com- 1988 217226 pletely split in (d), resulting in incorrect recognition results. 1989 67403 1990 0 1991 109306 1992 185144 1993 121401 1994 86425 1995 98994 1996 46638 (a) 1997 82101 1998 76434 1999 37270 2000 34679 2001 68460 2002 43292 (b) 2003 12983 Total 7760911 there is lighting interference on the images, their main structures are preserved so that their recognition results are (c) not affected. ,e defective structures can be identified, which is one of the advantages of CNN. Although these images affected by lighting only account for a small part, to solve this problem, our further plan is adding some samples affected by lighting to the training set (d) and improving the algorithm of character segmentation. In total, we get date/time information for more than 7 Figure 11: Original image (a, c) and segmentation result (b, d), million pictures of 38 years, as shown in Table 5. ,e respectively. As shown in figure (b, d), the recognition result is remaining unprocessed images such as those in 1971, 1986, correct, even though the fourth character is partially split. and 1990 are due to time stamps that are beyond human of the whole samples, as shown in Table 4, so they contribute a recognition or do not have time stamps, about 10% of the little to the average recognition accuracy. Besides, the rec- total. It is not necessary to deal with these pictures because it ognition results of some characters are not affected by illu- is impossible to verify whether they are recognized correctly mination, such as “8” and “9,” as shown in Figure 11. When or not. ,e number of pictures per year is also shown in a bar 8 Advances in Astronomy ×10 Year Figure 12: Number of pictures per year. chart as shown in Figure 12. ,e number of pictures rose U1531247, 11427803, 11427901, and 11873062, the 13th slowly from 1963 to 1967, peaking in 1967 with about 700 Five-year Informatization Plan of Chinese Academy of thousand pictures. After 1967, the number of pictures de- Sciences under Grant XXH13505-04, and the Beijing Mu- clined dramatically. In 2003, there were about 13,000 nicipal Science and Technology Project under Grant pictures. Z181100002918004. Haimin Wang acknowledges the sup- port of US NSF under grant AGS-1620875. ,e authors are 5. Conclusion grateful to the National Solar Observatory for providing the original film data. In this paper, we describe an intelligent algorithm to extract the time stamp from traditional films based on CNN. ,e References experimental results show that the method has a good result and meets the speed and quality requirements for identi- [1] G. Ai and Y. Hu, “On principle of solor magnetic field fication. It also has strong portability in solving the same telescope,” Acta Astronomica Sinica, vol. 2, pp. 91–98, 1986. type of problems in similar applications. [2] https://www.noao.edu/outreach/kptour/mcmath.html. Finally, we get date/time information for more than 7 [3] R. Liu, C. Liu, Y. Xu, W. Liu, B. Kliem, and H. Wang, million pictures which are recorded by NSO of the US. ,is “Observation of a Moreton wave and wave-filament in- greatly reduces the amount of manual work, so that this teractions associated with the renowned X9 flare on 1990 May 24,” e Astrophysical Journal, vol. 773, no. 2, p. 166, 2013. batch of data can be effectively utilized by researchers as [4] H. Wang and C. Liu, “Circular ribbon flares and homologous soon as possible. ,e method proposed in this paper can be jets,” e Astrophysical Journal, vol. 760, no. 2, p. 101, 2012. applied to character recognition in other historical image, [5] D. Nasien, H. Haron, and S. S. Yuhaniz, “Support vector such as handwritten character recognition in sunspot machine (SVM) for english handwritten character recogni- drawing. tion,” in Proceedings of the Second International Conference on Computer Engineering and Applications IEEE, vol. 1, Data Availability pp. 249–252, Camastra, Italy, 2010. [6] I. H. Witten, E. Frank, M. A. Hall et al., Data Mining: Practical ,e data used to support the findings of this study are Machine Learning Tools and Techniques, Morgan Kaufmann, available from the corresponding author upon request. And Burlington, MA, USA, 2016. in the future, the data used to support the findings of this [7] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, study will be published online. vol. 521, no. 7553, pp. 436–444, 2015. [8] J. Schmidhuber, “Deep learning in neural networks: an Conflicts of Interest overview,” Neural Networks, vol. 61, pp. 85–117, 2015. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet ,e authors declare that they have no conflicts of interest. classification with deep convolutional neural networks,” in Proceedings of the International Conference on Neural In- formation Processing Systems, pp. 1097–1105, Curran Asso- Acknowledgments ciates Inc., Lake Tahoe, NV, USA, December 2012. ,is work is supported in part by the National Natural [10] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face Science Foundation of China under Grants U1731124, recognition: a convolutional neural-network approach,” IEEE Number of images 2003 Advances in Astronomy 9 Transactions on Neural Networks, vol. 8, no. 1, pp. 98–113, [11] Y. Sun, X. Wang, and X. Tang, “Deep learning face repre- sentation from predicting 10,000 classes,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898, Las Vegas Valley, NV, USA, June 2014. [12] G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [13] S. Zheng, X. Zeng, G. Lin et al., “Sunspot drawings hand- written character recognition method based on deep learn- ing,” New Astronomy, vol. 45, pp. 54–59, 2016. [14] I. J. Goodfellow, Y. Bulatov, J. Ibarz et al., “Multi-digit number recognition from street view imagery using deep convolu- tional neural networks,” 2013, https://arxiv.org/abs/1312. [15] Y. Kim, Convolutional Neural Networks for Sentence Classi- fication, 2014. [16] J. Gu, Z. Wang, J. Kuen et al., “Recent advances in con- volutional neural networks,” Pattern Recognition, vol. 77, pp. 354–377, 2018. [17] J. Sauvola and M. Pietikainen, ¨ “Adaptive document image binarization,” Pattern Recognition, vol. 33, no. 2, pp. 225–236, [18] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970, San Francisco, CA, USA, June 2010. [19] Y. Li and H. Lu, “Scene text detection via stroke width,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 681–684, IEEE, Tsukuba, Japan, November 2012. [20] K. M. Mohiuddin and J. Mao, “Optical character recognition,” in Wiley Encyclopedia of Electrical and Electronics Engineer- ing, Springer, Berlin, Germany, 1999. [21] R. Smith, “An overview of the tesseract OCR engine,” in Proceedings of the Ninth International Conference on Docu- ment Analysis and Recognition (ICDAR 2007), IEEE Com- puter Society, Curitiba, Parana, Brazil, September 2007. Journal of International Journal of The Scientific Advances in Applied Bionics Engineering Geophysics Chemistry World Journal and Biomechanics Hindawi Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Active and Passive Shock and Vibration Electronic Components Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Advances in Advances in Mathematical Physics Astronomy Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 International Journal of Rotating Machinery Advances in Optical Advances in Technologies OptoElectronics Advances in Advances in Physical Chemistry Condensed Matter Physics Hindawi Hindawi Hindawi Hindawi Volume 2018 www.hindawi.com Hindawi Volume 2018 Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com International Journal of Journal of International Journal of Advances in Antennas and Advances in Chemistry Propagation High Energy Physics Acoustics and Vibration Optics Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal

Advances in AstronomyHindawi Publishing Corporation

Published: Aug 28, 2019

References