Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Full-Disk Solar Flare Forecasting Model Based on Data Mining Method

Full-Disk Solar Flare Forecasting Model Based on Data Mining Method Hindawi Advances in Astronomy Volume 2019, Article ID 5190353, 6 pages https://doi.org/10.1155/2019/5190353 Research Article Full-Disk Solar Flare Forecasting Model Based on Data Mining Method 1 2 Rong Li and Yong Du School of Information, Beijing Wuzi University, Beijing 101149, China Department of Electrical and Information Engineering, Northeast Agricultural University, Harbin, China Correspondence should be addressed to Rong Li; lirong@bao.ac.cn Received 11 April 2019; Accepted 18 June 2019; Published 1 August 2019 Guest Editor: Liyun Zhang Copyright © 2019 Rong Li and Yong Du. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Solar flare is one of the violent solar eruptive phenomena; many solar flare forecasting models are built based on the properties of of solar disk center because of the projection active regions. However, most of these models only focus on active regions within 30 effect. Using cost sensitive decision tree algorithm, we build two solar flare forecasting models from the active regions within 30 of solar disk center and outside 30 of solar disk center, respectively. eTh performances of these two models are compared and analyzed. Merging these two models into a single one, we obtain a full-disk solar flare forecasting model. 1. Introduction proposed to forecast solar afl res using the previous flaring records [5]. Leka and Barnes (2007) applied discriminant Solar activities are the primary source of space weather. As analysis to produce a binary categorization for the flaring and one of the important solar eruptive phenomena, solar afl res nonflaring regions [6], and this approach was extended to a associated with the electromagnetic radiation and energetic probabilistic forecast in Barnes et al. (2007) [7]. particles often interfere with geostationary satellites, com- Data mining methods also have a long history for the munication systems, and even power grids [1]. Therefore, application in solar flare forecasting. Bradshaw et al. (1989) solar flare forecasting is a significant topic in space weather trained a three-layer neural network to forecast flares [8]. forecasting community. Wang et al. (2008) built a solar afl re forecasting model supported with an articfi ial neural network based on the solar Because the trigger mechanisms of solar flares are unsolved, the current solar flare forecasting only depends magnetic efi ld parameters [9]. Li et al. (2007) proposed a on the probabilistic model. The statistical and data mining data mining method combining the support vector machine and the k-nearest neighbors to train a solar ar fl e forecasting methods are used to build solar afl re forecasting models. Miller (1989) developed an expert system (WOLF) to forecast model [10]. Qahwaji and Colak (2007) built a hybrid system theprobableoccurrenceof solar afl res[2]. McIntosh (1990) that combines a support vector machine and a cascade- summarized the McIntosh classifications of sunspots and correlation neural network for solar flare forecasting [11]. The builtanexpertsystem(Theo)toforecastX-rayflares[3].Long sequential information of active regions is analyzed in [12– 16]. The active longitudes information is used to improve aer ft this work, the McIntosh classifications are considered as a guide in forecasting solar flares in many space weather the performance of solar ar fl e forecasting in [17]. At present, prediction centers. Measuring contributions of the McIntosh deep learning methods have been used to build solar flare forecasting models [18, 19]. classifications for solar flare forecasting, Bornmann and Shaw (1994) built a solar ar fl e forecasting model using multiple Because of the projection effect of solar magnetograms, liner regression analysis [4]. Wheatland (2004) pointed out active regions within 30 of solar disk center, where projec- that the history of solar flares is also an important indicator tion effect can be negligible, are usually selected to extract for the occurrence of solar afl res, so a Bayesian approach was parameters and furthermore to build the forecasting model. 2 Advances in Astronomy Full-disk flare forecasting model Active Forecasting regions model 1 within 30 Active Forecasting regions model 2 outside 30 Figure 1: Full-disk solar flare forecasting model. However, active regions which locate outside 30 of solar (4) Sum of photospheric magnetic free energy. disk center also produce solar flares. In the present work, 𝑜𝑏𝑠 𝑝𝑜𝑡 we collect the data for active regions outside 30 of solar (1) 𝜌 =∑(𝐵 −𝐵 ) sum disk center and their related solar flares and build a solar flare forecasting model from this dataset. Combining it with 𝜌 measures the nonpotentiality of an active region. sum the solar afl re forecasting model trained from active regions within 30 of solar disk center, we obtain a full-disk solar afl re 2.2. Flare Data. According to the peak u fl x of 1 to 8 angstrom forecasting model shown in Figure 1. X-rays, solar flare is classified as different class levels shown The paper is organized as follows. In Section 2, we intro- in Table 1. Within a class level, there is a linear scale from 1 to duce active region parameters and the related flare catalog. In 9. For example, a C2 flare is twice as powerful as a C1 flare. Section 3, we describe the data mining method. In Section 4, Solar flares whose Geostationary Operational Environ- we estimate the performance of the solar flare forecasting mental Satellite (GOES) X-ray u fl x peak magnitude is larger model. And finally, in Section 5, we give a brief summary of than the C1.0 level are considered in the present work. Solar this work. flare data is collected from the National Geophysical Data Center GOES X-ray ux fl flare catalogs. An active region is 2. Data considered as a flaring sample, when this region produces a afl re whose level is larger than C1.0 within 48 hours aer ft 2.1. Active Region Data. The Solar Dynamics Observatory the observation of this active region. Otherwise, an active (SDO) satellite is launched on 2010 February. The Helio- region is considered as a nonflaring sample. As such, there seismic and Magnetic Imager (HMI), which is one of are 74 ar fl ing samples and 1362 nonar fl ing samples for active three instruments aboard the SDO, measures the full-disk regions within 30 of solar disk center. And there are 101 photospheric vector magnetic field [20]. In 2014, a data flaring samples and 1429 nonflaring samples for active regions product called Space Weather HMI Active Region Patches outside 30 of solar disk center. (SHARP) automatically identifies active regions using the vector magnetic field data when these active regions cross 3. Method the solar disk [21]. For this study, we use the active region vector magnetic field data generated by the SDO’s SHARP 3.1. Basic Algorithm. A decision tree is a flowchart-like model data patches from 2011 August to 2012 July. We calculate 4 that shows the various outcomes from a series of decisions. It physical parameters using these 12 month vector magnetic can be used for research analysis or for building forecasting field data, and obtain 2966 samples including 1436 samples model. ∘ ∘ within 30 of solardiskcenterand 1530 samples outside 30 Decision trees have three main parts: a root node, leaf of solar disk center. nodes, and branches. The root node is the starting point, root The 4 physical parameters are: contains questions or criteria to be answered, and leaf nodes stand for the decision of the model. Branches are arrows (1) The maximum horizontal gradient of the longitudinal connecting nodes, showing the information flow between the magnetic field: this parameter estimates maximum nodes. squeezing among u fl x systems in an active region. The decision tree algorithm is used to build the solar flare (2) The length of neutral lines: the neutral lines separate forecasting model. This means that the forecasting model opposite polarities of the longitudinal magnetic eld fi will be represented by a tree-like structure shown in Figure 2 [22]. [23]. The decision tree consists of testing nodes, leaf nodes, (3) The number of singular points: it is the number of and branches. A sample is classified from the root node. nodes in the network formed by magnetic separatri- The specified parameter of this node is calculated and the ces [22]. sample is moved down along the corresponding branch and Advances in Astronomy 3 Root node Branch Branch Node Node Branch Branch Leaf Leaf Leaf Leaf node node node node Figure 2: Structure of decision tree. Table 1: Classifications of solar X-ray flares. Class level peak u fl x of 1 to 8 angstrom X-rays (Watts/square metre) −7 A < 10 −7 −6 B 10 -10 −6 −5 C 10 -10 −5 −4 M 10 -10 −4 X > 10 finally goes to the leaf node where the classification result is gain, and information gain ratio. Generally, the probability determined. is estimated by the frequency calculated from the dataset. The decision tree is constructed from the training set 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 recursively. In each step, the best parameter is selected to 󵄨 󵄨 𝑃(𝐷=𝑑 )= (3) generate the test node and the corresponding branches. The 𝐷 | | parameter is evaluated by information gain rat where|D| is the number of samples in set D and|D= d| is the number of samples with class label d in set D. In the cost sensitive algorithm, there are different costs for (𝐷,𝐹 ) (𝐷,𝐹 )= (2) different class labels. For example, for a binary classification 𝐻 (𝐹 ) problem, the cost for class d is C ,and thecost for class d 0 0 1 is C . u Th s, the probability for cost sensitive problem can be where D stands for the decision of the model, F stands estimated as follows. for the feature of the model, IG(D,F)=H(D)-H(D|F) is the 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 ×𝐶 0 0 󵄨 󵄨 information gain (IG), and H stands for the entropy which 𝑃 (𝐷=𝑑 )= (4) c𝑜𝑠𝑡 0 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 𝐷=𝑑 ×𝐶 + 𝐷=𝑑 ×𝐶 󵄨 󵄨 󵄨 󵄨 is used to measure the uncertainty of a system. 󵄨 0󵄨 0 󵄨 1󵄨 1 The training dataset is divided into some subsets accord- 󵄨 󵄨 󵄨 󵄨 𝐷=𝑑 ×𝐶 󵄨 󵄨 󵄨 1󵄨 1 ing to the value of branches. This process is repeated until the 𝑃 𝐷=1𝑑 = (5) ( ) co𝑠𝑡 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 ×𝐶 +󵄨𝐷=𝑑 󵄨 ×𝐶 following stop criteria are satisfied: (1) samples in the subset 0 0 1 1 󵄨 󵄨 󵄨 󵄨 have the same class label or (2) all possible tests have the same In fact, the usual probability is considered as the cost class distribution [24]. When the stop criteria are satisefi d, the sensitive probability when the costs C and C are settled to 0 1 leaf node is generated. The class label of the samples in the leaf 1. Using the cost sensitive probability, we can calculate the node is the same as that of the majority of samples in this leaf cost sensitive entropy and information gain. Similar to the node. basic decision tree algorithm, we can build the cost sensitive decision tree model. 3.2. Cost Sensitive Modification for the Basic Algorithm. As shown in Section 2, the ratio between nonflaring samples 4. Performance and the flaring samples is 16. This is called class imbalance problem in data mining community. In order to treat the class 4.1. Performance Metrics. For a binary forecasting model, the imbalance problem, we modified the basic algorithm to the results can be summarized in contingency table shown in cost sensitive one [25]. Table 2. The ar fl ing sample is called positive one, and the In the basic decision tree algorithm, the probability is nonflaring sample is called negative one. The actual positive a basic component to calculate the entropy, information sample correctly forecasted as positive one is called true 𝐺𝑅 𝐼𝐺 4 Advances in Astronomy Table 2: Definition of contingency table. Forecast positive Forecast negative Actual positive N N TP FN Actual negative N N FP TN Table 3: Contingency table for model 1. Forecast positive Forecast negative Actual positive 53 21 Actual negative 161 1201 Table 4: Contingency table for model 2. Forecast positive Forecast negative Actual positive 78 23 Actual negative 464 965 Table 5: Performances of solar flare forecasting models. TP rate TN rate HSS Model 1 71.6% 88.2% 0.316 Model 2 77.2% 67.5% 0.148 positive (TP), the actual positive sample wrongly forecasted training set, and only one sample is used as testing set. The as negative one is false negative (FN), the actual negative process is repeated as many times as the number of samples sample correctly forecasted as negative one is true negative in the dataset. Leave-one-out cross validation method does (TN), and thetrue negative samplewrongly forecasted as not waste data; however, it is computationally expensive. positive one is false positive (FP). Cost sensitive decision tree is an efficient algorithm, so we Using the contingency table, 3 performance metrics are can complete the leave-one-out testing. The cost for flaring den fi ed to compare the performance of the forecasting model. samples is 50 times larger than that for nonflaring samples. The TP rate andTNrate are denfi edto evaluate the accuracy In order to simplify the following discussion, solar flare of flaring samples and nonflaring samples, respectively. ∘ forecasting model learned from samples within the 30 of solar disk center is called model 1. And solar flare forecasting 𝑒𝑡𝑎𝑟= (6) model learned from samples outside the 30 of solar disk 𝑁 +𝑁 center is called model 2. The contingency tables of model 1 and model 2 are shown in Tables 3 and 4. Based on these 𝑒𝑡𝑎𝑟= (7) contingency tables, the performances of the two forecasting 𝑁 +𝑁 models can be compared by the performance metrics shown in Table 5. Heidke skill score (HSS) is used to evaluate the increase in forecasting power over that of random forecast: From Table 5, we can find that the performance of model 2 is worse than that of model 1, because the physical −𝐸 parameters used in model 2 could be influenced by the 𝑆= (8) 1−𝐸 projection effect. However, the performance of model 2 is acceptable. Combining model 1 and model 2, we can obtain a where =(𝑁 +𝑁 )/(𝑁 +𝑁 +𝑁 +𝑁 ) and full-disk solar flare forecasting model. At present, little work can provide forecasting results of (𝑁 +𝑁 )(𝑁 +𝑁 ) 𝐸= solar flares in the active region beyond 30 degrees of the (𝑁 +𝑁 +𝑁 +𝑁 ) solar disk; hence, we choose the forecasting results in the (9) active region within 30 degrees to compare them with the (𝑁 +𝑁 )(𝑁 +𝑁 ) + . flare forecasting results provided by the convolution neural (𝑁 +𝑁 +𝑁 +𝑁 ) network [24]. The results are shown in Table 6. We find that the flare forecasting model built by the convolution neural 4.2. Results. There are 2966 samples in the dataset. In order network has a higher TP rate, while our forecasting model has to make good use of this data, leave-one-out cross validation a higher TN rate. Because the proportions of flaring samples method is used to evaluate the performance of the forecasting and nonflaring samples are different in the testing dataset, the model. In this method, all but one of the samples is used as HSS is incomparable. 𝐹𝑃 𝐹𝑁 𝑇𝑁 𝑇𝑃 𝐹𝑁 𝑇𝑁 𝐹𝑃 𝑇𝑁 𝐹𝑃 𝑇𝑃 𝑇𝑁 𝑇𝑃 𝐹𝑃 𝑇𝑃 𝐹𝑁 𝑇𝑃 𝐹𝑃 𝐹𝑁 𝑇𝑁 𝑇𝑃 𝑇𝑁 𝑇𝑃 𝑃𝐶 𝐻𝑆 𝑃𝐶 𝐹𝑃 𝑇𝑁 𝑇𝑁 𝑇𝑁 𝐹𝑁 𝑇𝑃 𝑇𝑃 𝑇𝑃 Advances in Astronomy 5 Table 6: Performance comparisons. Performance index Decision tree CNN TP rate 71.6% 85.0% TN rate 88.2% 81.0% HSS 0.316 0.143 5. Conclusion [4] P. L. Bornmann and D. Shaw, “Flare rates and the McIntosh active-region classifications,” Solar Physics,vol.150, no. 1-2, pp. Space Weather HMI Active Region Patches data product 127–146, 1994. automatically identifies the active regions when they cross [5] M. S. Wheatland, “A bayesian approach to solar flare prediction the solar disk. We classify the active region samples into iop-2016.pngA publishing partnershipA Bayesian Approach to Solar Flare Prediction,” The Astrophysical Journal ,vol.609,p. two groups by their location information. The active region 1134, 2004. samples located within the 30 of solardisk centerare [6] K. D. Leka and G. Barnes, “Photospheric magnetic field proper- classified into group one, and the rest of samples are classified ties of flaring versus flare-quiet active regions. iv. a statistically into group two. The projection effect of the samples in significant sample,” The Astrophysical Journal ,vol.656,no.2,p. group one can be negligible, but the magnetic parameters 1173, 2007. extracted from active region in group two could not be too [7] G.Barnes,K.D.Leka, E. A.Schumer, and D.J.Della-Rose, accurate because of the projection effect. Two solar flare “Probabilistic forecasting of solar flares from vector magne- forecasting models are built using data mining method from togram data,” Space Weather, vol.5,no. 9, p. S09002, 2007. two group samples, respectively. The performances of these [8] G. Bradshaw, R. Fozzard, and L. Ceci, “A connectionist expert two forecasting models are estimated. The performance of system that actually works,” Adv Neu Inform Proc Sys,vol. 1, pp. the forecasting model learned from samples within the 30 248–255, 1989. of solar disk center is better than that of the forecasting [9] H. N.Wang, Y.M. Cui, R.Li, L.Y.Zhang,and H.Han,“Solar model learned from other samples, because the parameters flare forecasting model supported with artificial neural network extracted from the active regions outside the 30 of solar techniques,” Advances in Space Research, vol.42, no. 9,p.1464, disk center are not accurate enough, and the uncertainty is introduced to evaluate the nonpotentiality of these active [10] R.Li, H.N.Wang,H.He,Y. M. Cui, and Z.L.Du,“Support vec- regions. A full-disk solar flare forecasting model is generated tor machine combined with k-nearest neighbors for solar flare by combining the two models together. forecasting,” Chinese Journal of Astronomy and Astrophysics,vol. 7,no. 3,p.441,2007. [11] R. Qahwaji and T. Colak, “Automatic short-term solar flare Data Availability prediction using machine learning and sunspot associations,” Solar Physics,vol.241,p.195, 2007. The data used to support the findings of this study are [12] D. Yu, X. Huang,H.Wang, and Y.Cui, “Short-term solar available from the corresponding author upon request. flare prediction using a sequential supervised learning method,” Solar Physics,vol.255, no.1,pp.91–105,2009. [13] D. Yu, X.Huang,Q.Hu, R.Zhou, H.Wang,and Y. Cui, “Short- Conflicts of Interest term solar flare prediction using multiresolution predictors,” The Astrophysical Journal , vol. 709, no. 1, p. 321, 2010. The authors declare that they have no conflicts of interest. [14] D. Yu, X.Huang,H.Wang,Y.Cui, Q. Hu, and R.Zhou, “Short- term solar flare level prediction using a bayesian network Acknowledgments approach,” The Astrophysical Journal , vol. 710, no. 1, p. 869, 2010. [15] X.Huang, D.Yu,Q.Hu,H.Wang,and Y. Cui, “Short-termsolar The data used herein was made possible by funding to flare prediction using predictor teams,” Solar Physics,vol.263, NWRA from NASA/LWS contract NNH09CE72C (Dr. Gra- no. 1-2, pp. 175–184, 2010. ham Barnes, PI). This work is supported by the National [16] X. Huang and H. N. Wnag, “Solar flare prediction using highly Natural Science Foundation of China (NSFC) (Grant No. stressed longitudinal magnetic field parameters,” Research in 11303051), Beijing Intelligent Logistics System Collaborative astronomy and astrophysics, vol. 13, no. 3, pp. 351–358, 2013. Innovation Center, and Beijing Key Laboratory (No. BZ0211). [17] X. Huang, L. Zhang, H. Wang, and L. Li, “Improving the performance of solar flare prediction using active longitudes information,” Astronomy & Astrophysics,vol. 549, article A127, References p. 6, 2013. [18] X. Huang, H. Wang, L. Xu, J. Liu, R. Li, and X. Dai, “Deep [1] G. Ai, H. Wang, and J. Wang, “What is a solar electromagnetic learning based solar flare forecasting model. i. results for line- storm?” Space Weather Journal, vol.10,no.9,2012. of-sight magnetograms,” The Astrophysical Journal ,vol.856, no. [2] A. Heck and F. Murtagh, “Knowledge-based systems in astron- 1, p. 7, 2018. omy,” in Lecture Notes in Physics, 1989, 329. [19] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, and M. Ishii, [3] P. S. McIntosh, “eTh classification of sunspot groups,” Solar “Deep flare net (defn) model for solar flare prediction,” The Physics, vol.125,no. 2,pp.251–267,1990. Astrophysical Journal, vol. 858, no. 2, 2018. 6 Advances in Astronomy [20] J. Schou, P. H. Scherrer, R. I. Bush et al., “Design and ground calibration of the helioseismic and magnetic imager (hmi) instrument on the solar dynamics observatory (sdo),” Solar Physics, vol. 275, no. 1-2, pp. 229–259, 2012. [21] M. G. Bobra, X. Sun, J. T. Hoeksema et al., “eTh helioseismic and magnetic imager (hmi) vector magnetic field pipeline: sharps – space-weather hmi active region patches,” Solar Physics,vol.289, no.9,pp.3549–3578,2014. [22] Y.Cui, R.Li,L. Zhang,Y.He, and H.Wang, “Correlation between solar flare productivity and photospheric magnetic field properties,” Solar Physics, vol.237,p. 45,2006. [23] X. Huang, H. N. Wang, and X. H. Dai, “Science china physics,” Mechanics and Astronomy, vol. 55, no. 10, pp. 1956–1962. [24] J. R. Quinlan, C4.5: Programs for Machine Learning,Morgan Kaufmann Publishers, San Mateo, Calif, USA, 1993. [25] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, San Mateo, Calif, USA, 2005. Journal of International Journal of The Scientific Advances in Applied Bionics Engineering Geophysics Chemistry World Journal and Biomechanics Hindawi Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Active and Passive Shock and Vibration Electronic Components Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Advances in Advances in Mathematical Physics Astronomy Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 International Journal of Rotating Machinery Advances in Optical Advances in Technologies OptoElectronics Advances in Advances in Physical Chemistry Condensed Matter Physics Hindawi Hindawi Hindawi Hindawi Volume 2018 www.hindawi.com Hindawi Volume 2018 Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com International Journal of Journal of International Journal of Advances in Antennas and Advances in Chemistry Propagation High Energy Physics Acoustics and Vibration Optics Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Advances in Astronomy Hindawi Publishing Corporation

Full-Disk Solar Flare Forecasting Model Based on Data Mining Method

Advances in Astronomy , Volume 2019 – Aug 1, 2019

Loading next page...
 
/lp/hindawi-publishing-corporation/full-disk-solar-flare-forecasting-model-based-on-data-mining-method-0f60VrtPVc
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2019 Rong Li and Yong Du. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-7969
eISSN
1687-7977
DOI
10.1155/2019/5190353
Publisher site
See Article on Publisher Site

Abstract

Hindawi Advances in Astronomy Volume 2019, Article ID 5190353, 6 pages https://doi.org/10.1155/2019/5190353 Research Article Full-Disk Solar Flare Forecasting Model Based on Data Mining Method 1 2 Rong Li and Yong Du School of Information, Beijing Wuzi University, Beijing 101149, China Department of Electrical and Information Engineering, Northeast Agricultural University, Harbin, China Correspondence should be addressed to Rong Li; lirong@bao.ac.cn Received 11 April 2019; Accepted 18 June 2019; Published 1 August 2019 Guest Editor: Liyun Zhang Copyright © 2019 Rong Li and Yong Du. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Solar flare is one of the violent solar eruptive phenomena; many solar flare forecasting models are built based on the properties of of solar disk center because of the projection active regions. However, most of these models only focus on active regions within 30 effect. Using cost sensitive decision tree algorithm, we build two solar flare forecasting models from the active regions within 30 of solar disk center and outside 30 of solar disk center, respectively. eTh performances of these two models are compared and analyzed. Merging these two models into a single one, we obtain a full-disk solar flare forecasting model. 1. Introduction proposed to forecast solar afl res using the previous flaring records [5]. Leka and Barnes (2007) applied discriminant Solar activities are the primary source of space weather. As analysis to produce a binary categorization for the flaring and one of the important solar eruptive phenomena, solar afl res nonflaring regions [6], and this approach was extended to a associated with the electromagnetic radiation and energetic probabilistic forecast in Barnes et al. (2007) [7]. particles often interfere with geostationary satellites, com- Data mining methods also have a long history for the munication systems, and even power grids [1]. Therefore, application in solar flare forecasting. Bradshaw et al. (1989) solar flare forecasting is a significant topic in space weather trained a three-layer neural network to forecast flares [8]. forecasting community. Wang et al. (2008) built a solar afl re forecasting model supported with an articfi ial neural network based on the solar Because the trigger mechanisms of solar flares are unsolved, the current solar flare forecasting only depends magnetic efi ld parameters [9]. Li et al. (2007) proposed a on the probabilistic model. The statistical and data mining data mining method combining the support vector machine and the k-nearest neighbors to train a solar ar fl e forecasting methods are used to build solar afl re forecasting models. Miller (1989) developed an expert system (WOLF) to forecast model [10]. Qahwaji and Colak (2007) built a hybrid system theprobableoccurrenceof solar afl res[2]. McIntosh (1990) that combines a support vector machine and a cascade- summarized the McIntosh classifications of sunspots and correlation neural network for solar flare forecasting [11]. The builtanexpertsystem(Theo)toforecastX-rayflares[3].Long sequential information of active regions is analyzed in [12– 16]. The active longitudes information is used to improve aer ft this work, the McIntosh classifications are considered as a guide in forecasting solar flares in many space weather the performance of solar ar fl e forecasting in [17]. At present, prediction centers. Measuring contributions of the McIntosh deep learning methods have been used to build solar flare forecasting models [18, 19]. classifications for solar flare forecasting, Bornmann and Shaw (1994) built a solar ar fl e forecasting model using multiple Because of the projection effect of solar magnetograms, liner regression analysis [4]. Wheatland (2004) pointed out active regions within 30 of solar disk center, where projec- that the history of solar flares is also an important indicator tion effect can be negligible, are usually selected to extract for the occurrence of solar afl res, so a Bayesian approach was parameters and furthermore to build the forecasting model. 2 Advances in Astronomy Full-disk flare forecasting model Active Forecasting regions model 1 within 30 Active Forecasting regions model 2 outside 30 Figure 1: Full-disk solar flare forecasting model. However, active regions which locate outside 30 of solar (4) Sum of photospheric magnetic free energy. disk center also produce solar flares. In the present work, 𝑜𝑏𝑠 𝑝𝑜𝑡 we collect the data for active regions outside 30 of solar (1) 𝜌 =∑(𝐵 −𝐵 ) sum disk center and their related solar flares and build a solar flare forecasting model from this dataset. Combining it with 𝜌 measures the nonpotentiality of an active region. sum the solar afl re forecasting model trained from active regions within 30 of solar disk center, we obtain a full-disk solar afl re 2.2. Flare Data. According to the peak u fl x of 1 to 8 angstrom forecasting model shown in Figure 1. X-rays, solar flare is classified as different class levels shown The paper is organized as follows. In Section 2, we intro- in Table 1. Within a class level, there is a linear scale from 1 to duce active region parameters and the related flare catalog. In 9. For example, a C2 flare is twice as powerful as a C1 flare. Section 3, we describe the data mining method. In Section 4, Solar flares whose Geostationary Operational Environ- we estimate the performance of the solar flare forecasting mental Satellite (GOES) X-ray u fl x peak magnitude is larger model. And finally, in Section 5, we give a brief summary of than the C1.0 level are considered in the present work. Solar this work. flare data is collected from the National Geophysical Data Center GOES X-ray ux fl flare catalogs. An active region is 2. Data considered as a flaring sample, when this region produces a afl re whose level is larger than C1.0 within 48 hours aer ft 2.1. Active Region Data. The Solar Dynamics Observatory the observation of this active region. Otherwise, an active (SDO) satellite is launched on 2010 February. The Helio- region is considered as a nonflaring sample. As such, there seismic and Magnetic Imager (HMI), which is one of are 74 ar fl ing samples and 1362 nonar fl ing samples for active three instruments aboard the SDO, measures the full-disk regions within 30 of solar disk center. And there are 101 photospheric vector magnetic field [20]. In 2014, a data flaring samples and 1429 nonflaring samples for active regions product called Space Weather HMI Active Region Patches outside 30 of solar disk center. (SHARP) automatically identifies active regions using the vector magnetic field data when these active regions cross 3. Method the solar disk [21]. For this study, we use the active region vector magnetic field data generated by the SDO’s SHARP 3.1. Basic Algorithm. A decision tree is a flowchart-like model data patches from 2011 August to 2012 July. We calculate 4 that shows the various outcomes from a series of decisions. It physical parameters using these 12 month vector magnetic can be used for research analysis or for building forecasting field data, and obtain 2966 samples including 1436 samples model. ∘ ∘ within 30 of solardiskcenterand 1530 samples outside 30 Decision trees have three main parts: a root node, leaf of solar disk center. nodes, and branches. The root node is the starting point, root The 4 physical parameters are: contains questions or criteria to be answered, and leaf nodes stand for the decision of the model. Branches are arrows (1) The maximum horizontal gradient of the longitudinal connecting nodes, showing the information flow between the magnetic field: this parameter estimates maximum nodes. squeezing among u fl x systems in an active region. The decision tree algorithm is used to build the solar flare (2) The length of neutral lines: the neutral lines separate forecasting model. This means that the forecasting model opposite polarities of the longitudinal magnetic eld fi will be represented by a tree-like structure shown in Figure 2 [22]. [23]. The decision tree consists of testing nodes, leaf nodes, (3) The number of singular points: it is the number of and branches. A sample is classified from the root node. nodes in the network formed by magnetic separatri- The specified parameter of this node is calculated and the ces [22]. sample is moved down along the corresponding branch and Advances in Astronomy 3 Root node Branch Branch Node Node Branch Branch Leaf Leaf Leaf Leaf node node node node Figure 2: Structure of decision tree. Table 1: Classifications of solar X-ray flares. Class level peak u fl x of 1 to 8 angstrom X-rays (Watts/square metre) −7 A < 10 −7 −6 B 10 -10 −6 −5 C 10 -10 −5 −4 M 10 -10 −4 X > 10 finally goes to the leaf node where the classification result is gain, and information gain ratio. Generally, the probability determined. is estimated by the frequency calculated from the dataset. The decision tree is constructed from the training set 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 recursively. In each step, the best parameter is selected to 󵄨 󵄨 𝑃(𝐷=𝑑 )= (3) generate the test node and the corresponding branches. The 𝐷 | | parameter is evaluated by information gain rat where|D| is the number of samples in set D and|D= d| is the number of samples with class label d in set D. In the cost sensitive algorithm, there are different costs for (𝐷,𝐹 ) (𝐷,𝐹 )= (2) different class labels. For example, for a binary classification 𝐻 (𝐹 ) problem, the cost for class d is C ,and thecost for class d 0 0 1 is C . u Th s, the probability for cost sensitive problem can be where D stands for the decision of the model, F stands estimated as follows. for the feature of the model, IG(D,F)=H(D)-H(D|F) is the 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 ×𝐶 0 0 󵄨 󵄨 information gain (IG), and H stands for the entropy which 𝑃 (𝐷=𝑑 )= (4) c𝑜𝑠𝑡 0 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 𝐷=𝑑 ×𝐶 + 𝐷=𝑑 ×𝐶 󵄨 󵄨 󵄨 󵄨 is used to measure the uncertainty of a system. 󵄨 0󵄨 0 󵄨 1󵄨 1 The training dataset is divided into some subsets accord- 󵄨 󵄨 󵄨 󵄨 𝐷=𝑑 ×𝐶 󵄨 󵄨 󵄨 1󵄨 1 ing to the value of branches. This process is repeated until the 𝑃 𝐷=1𝑑 = (5) ( ) co𝑠𝑡 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨 󵄨𝐷=𝑑 󵄨 ×𝐶 +󵄨𝐷=𝑑 󵄨 ×𝐶 following stop criteria are satisfied: (1) samples in the subset 0 0 1 1 󵄨 󵄨 󵄨 󵄨 have the same class label or (2) all possible tests have the same In fact, the usual probability is considered as the cost class distribution [24]. When the stop criteria are satisefi d, the sensitive probability when the costs C and C are settled to 0 1 leaf node is generated. The class label of the samples in the leaf 1. Using the cost sensitive probability, we can calculate the node is the same as that of the majority of samples in this leaf cost sensitive entropy and information gain. Similar to the node. basic decision tree algorithm, we can build the cost sensitive decision tree model. 3.2. Cost Sensitive Modification for the Basic Algorithm. As shown in Section 2, the ratio between nonflaring samples 4. Performance and the flaring samples is 16. This is called class imbalance problem in data mining community. In order to treat the class 4.1. Performance Metrics. For a binary forecasting model, the imbalance problem, we modified the basic algorithm to the results can be summarized in contingency table shown in cost sensitive one [25]. Table 2. The ar fl ing sample is called positive one, and the In the basic decision tree algorithm, the probability is nonflaring sample is called negative one. The actual positive a basic component to calculate the entropy, information sample correctly forecasted as positive one is called true 𝐺𝑅 𝐼𝐺 4 Advances in Astronomy Table 2: Definition of contingency table. Forecast positive Forecast negative Actual positive N N TP FN Actual negative N N FP TN Table 3: Contingency table for model 1. Forecast positive Forecast negative Actual positive 53 21 Actual negative 161 1201 Table 4: Contingency table for model 2. Forecast positive Forecast negative Actual positive 78 23 Actual negative 464 965 Table 5: Performances of solar flare forecasting models. TP rate TN rate HSS Model 1 71.6% 88.2% 0.316 Model 2 77.2% 67.5% 0.148 positive (TP), the actual positive sample wrongly forecasted training set, and only one sample is used as testing set. The as negative one is false negative (FN), the actual negative process is repeated as many times as the number of samples sample correctly forecasted as negative one is true negative in the dataset. Leave-one-out cross validation method does (TN), and thetrue negative samplewrongly forecasted as not waste data; however, it is computationally expensive. positive one is false positive (FP). Cost sensitive decision tree is an efficient algorithm, so we Using the contingency table, 3 performance metrics are can complete the leave-one-out testing. The cost for flaring den fi ed to compare the performance of the forecasting model. samples is 50 times larger than that for nonflaring samples. The TP rate andTNrate are denfi edto evaluate the accuracy In order to simplify the following discussion, solar flare of flaring samples and nonflaring samples, respectively. ∘ forecasting model learned from samples within the 30 of solar disk center is called model 1. And solar flare forecasting 𝑒𝑡𝑎𝑟= (6) model learned from samples outside the 30 of solar disk 𝑁 +𝑁 center is called model 2. The contingency tables of model 1 and model 2 are shown in Tables 3 and 4. Based on these 𝑒𝑡𝑎𝑟= (7) contingency tables, the performances of the two forecasting 𝑁 +𝑁 models can be compared by the performance metrics shown in Table 5. Heidke skill score (HSS) is used to evaluate the increase in forecasting power over that of random forecast: From Table 5, we can find that the performance of model 2 is worse than that of model 1, because the physical −𝐸 parameters used in model 2 could be influenced by the 𝑆= (8) 1−𝐸 projection effect. However, the performance of model 2 is acceptable. Combining model 1 and model 2, we can obtain a where =(𝑁 +𝑁 )/(𝑁 +𝑁 +𝑁 +𝑁 ) and full-disk solar flare forecasting model. At present, little work can provide forecasting results of (𝑁 +𝑁 )(𝑁 +𝑁 ) 𝐸= solar flares in the active region beyond 30 degrees of the (𝑁 +𝑁 +𝑁 +𝑁 ) solar disk; hence, we choose the forecasting results in the (9) active region within 30 degrees to compare them with the (𝑁 +𝑁 )(𝑁 +𝑁 ) + . flare forecasting results provided by the convolution neural (𝑁 +𝑁 +𝑁 +𝑁 ) network [24]. The results are shown in Table 6. We find that the flare forecasting model built by the convolution neural 4.2. Results. There are 2966 samples in the dataset. In order network has a higher TP rate, while our forecasting model has to make good use of this data, leave-one-out cross validation a higher TN rate. Because the proportions of flaring samples method is used to evaluate the performance of the forecasting and nonflaring samples are different in the testing dataset, the model. In this method, all but one of the samples is used as HSS is incomparable. 𝐹𝑃 𝐹𝑁 𝑇𝑁 𝑇𝑃 𝐹𝑁 𝑇𝑁 𝐹𝑃 𝑇𝑁 𝐹𝑃 𝑇𝑃 𝑇𝑁 𝑇𝑃 𝐹𝑃 𝑇𝑃 𝐹𝑁 𝑇𝑃 𝐹𝑃 𝐹𝑁 𝑇𝑁 𝑇𝑃 𝑇𝑁 𝑇𝑃 𝑃𝐶 𝐻𝑆 𝑃𝐶 𝐹𝑃 𝑇𝑁 𝑇𝑁 𝑇𝑁 𝐹𝑁 𝑇𝑃 𝑇𝑃 𝑇𝑃 Advances in Astronomy 5 Table 6: Performance comparisons. Performance index Decision tree CNN TP rate 71.6% 85.0% TN rate 88.2% 81.0% HSS 0.316 0.143 5. Conclusion [4] P. L. Bornmann and D. Shaw, “Flare rates and the McIntosh active-region classifications,” Solar Physics,vol.150, no. 1-2, pp. Space Weather HMI Active Region Patches data product 127–146, 1994. automatically identifies the active regions when they cross [5] M. S. Wheatland, “A bayesian approach to solar flare prediction the solar disk. We classify the active region samples into iop-2016.pngA publishing partnershipA Bayesian Approach to Solar Flare Prediction,” The Astrophysical Journal ,vol.609,p. two groups by their location information. The active region 1134, 2004. samples located within the 30 of solardisk centerare [6] K. D. Leka and G. Barnes, “Photospheric magnetic field proper- classified into group one, and the rest of samples are classified ties of flaring versus flare-quiet active regions. iv. a statistically into group two. The projection effect of the samples in significant sample,” The Astrophysical Journal ,vol.656,no.2,p. group one can be negligible, but the magnetic parameters 1173, 2007. extracted from active region in group two could not be too [7] G.Barnes,K.D.Leka, E. A.Schumer, and D.J.Della-Rose, accurate because of the projection effect. Two solar flare “Probabilistic forecasting of solar flares from vector magne- forecasting models are built using data mining method from togram data,” Space Weather, vol.5,no. 9, p. S09002, 2007. two group samples, respectively. The performances of these [8] G. Bradshaw, R. Fozzard, and L. Ceci, “A connectionist expert two forecasting models are estimated. The performance of system that actually works,” Adv Neu Inform Proc Sys,vol. 1, pp. the forecasting model learned from samples within the 30 248–255, 1989. of solar disk center is better than that of the forecasting [9] H. N.Wang, Y.M. Cui, R.Li, L.Y.Zhang,and H.Han,“Solar model learned from other samples, because the parameters flare forecasting model supported with artificial neural network extracted from the active regions outside the 30 of solar techniques,” Advances in Space Research, vol.42, no. 9,p.1464, disk center are not accurate enough, and the uncertainty is introduced to evaluate the nonpotentiality of these active [10] R.Li, H.N.Wang,H.He,Y. M. Cui, and Z.L.Du,“Support vec- regions. A full-disk solar flare forecasting model is generated tor machine combined with k-nearest neighbors for solar flare by combining the two models together. forecasting,” Chinese Journal of Astronomy and Astrophysics,vol. 7,no. 3,p.441,2007. [11] R. Qahwaji and T. Colak, “Automatic short-term solar flare Data Availability prediction using machine learning and sunspot associations,” Solar Physics,vol.241,p.195, 2007. The data used to support the findings of this study are [12] D. Yu, X. Huang,H.Wang, and Y.Cui, “Short-term solar available from the corresponding author upon request. flare prediction using a sequential supervised learning method,” Solar Physics,vol.255, no.1,pp.91–105,2009. [13] D. Yu, X.Huang,Q.Hu, R.Zhou, H.Wang,and Y. Cui, “Short- Conflicts of Interest term solar flare prediction using multiresolution predictors,” The Astrophysical Journal , vol. 709, no. 1, p. 321, 2010. The authors declare that they have no conflicts of interest. [14] D. Yu, X.Huang,H.Wang,Y.Cui, Q. Hu, and R.Zhou, “Short- term solar flare level prediction using a bayesian network Acknowledgments approach,” The Astrophysical Journal , vol. 710, no. 1, p. 869, 2010. [15] X.Huang, D.Yu,Q.Hu,H.Wang,and Y. Cui, “Short-termsolar The data used herein was made possible by funding to flare prediction using predictor teams,” Solar Physics,vol.263, NWRA from NASA/LWS contract NNH09CE72C (Dr. Gra- no. 1-2, pp. 175–184, 2010. ham Barnes, PI). This work is supported by the National [16] X. Huang and H. N. Wnag, “Solar flare prediction using highly Natural Science Foundation of China (NSFC) (Grant No. stressed longitudinal magnetic field parameters,” Research in 11303051), Beijing Intelligent Logistics System Collaborative astronomy and astrophysics, vol. 13, no. 3, pp. 351–358, 2013. Innovation Center, and Beijing Key Laboratory (No. BZ0211). [17] X. Huang, L. Zhang, H. Wang, and L. Li, “Improving the performance of solar flare prediction using active longitudes information,” Astronomy & Astrophysics,vol. 549, article A127, References p. 6, 2013. [18] X. Huang, H. Wang, L. Xu, J. Liu, R. Li, and X. Dai, “Deep [1] G. Ai, H. Wang, and J. Wang, “What is a solar electromagnetic learning based solar flare forecasting model. i. results for line- storm?” Space Weather Journal, vol.10,no.9,2012. of-sight magnetograms,” The Astrophysical Journal ,vol.856, no. [2] A. Heck and F. Murtagh, “Knowledge-based systems in astron- 1, p. 7, 2018. omy,” in Lecture Notes in Physics, 1989, 329. [19] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, and M. Ishii, [3] P. S. McIntosh, “eTh classification of sunspot groups,” Solar “Deep flare net (defn) model for solar flare prediction,” The Physics, vol.125,no. 2,pp.251–267,1990. Astrophysical Journal, vol. 858, no. 2, 2018. 6 Advances in Astronomy [20] J. Schou, P. H. Scherrer, R. I. Bush et al., “Design and ground calibration of the helioseismic and magnetic imager (hmi) instrument on the solar dynamics observatory (sdo),” Solar Physics, vol. 275, no. 1-2, pp. 229–259, 2012. [21] M. G. Bobra, X. Sun, J. T. Hoeksema et al., “eTh helioseismic and magnetic imager (hmi) vector magnetic field pipeline: sharps – space-weather hmi active region patches,” Solar Physics,vol.289, no.9,pp.3549–3578,2014. [22] Y.Cui, R.Li,L. Zhang,Y.He, and H.Wang, “Correlation between solar flare productivity and photospheric magnetic field properties,” Solar Physics, vol.237,p. 45,2006. [23] X. Huang, H. N. Wang, and X. H. Dai, “Science china physics,” Mechanics and Astronomy, vol. 55, no. 10, pp. 1956–1962. [24] J. R. Quinlan, C4.5: Programs for Machine Learning,Morgan Kaufmann Publishers, San Mateo, Calif, USA, 1993. [25] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, San Mateo, Calif, USA, 2005. Journal of International Journal of The Scientific Advances in Applied Bionics Engineering Geophysics Chemistry World Journal and Biomechanics Hindawi Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Active and Passive Shock and Vibration Electronic Components Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Advances in Advances in Mathematical Physics Astronomy Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 International Journal of Rotating Machinery Advances in Optical Advances in Technologies OptoElectronics Advances in Advances in Physical Chemistry Condensed Matter Physics Hindawi Hindawi Hindawi Hindawi Volume 2018 www.hindawi.com Hindawi Volume 2018 Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com International Journal of Journal of International Journal of Advances in Antennas and Advances in Chemistry Propagation High Energy Physics Acoustics and Vibration Optics Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal

Advances in AstronomyHindawi Publishing Corporation

Published: Aug 1, 2019

References