Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Mining Educational Data to Analyze Students Performance

Mining Educational Data to Analyze Students Performance (IJACSA) International Journal of Advanced Computer Science and Applications, Mining Educational Data to Analyze Students‟ Performance Brijesh Kumar Baradwaj Saurabh Pal Research Scholor, Sr. Lecturer, Dept. of MCA, Singhaniya University, VBS Purvanchal University, Rajasthan, India Jaunpur-222001, India Abstract— The main objective of higher education institutions is There are increasing research interests in using data mining to provide quality education to its students. One way to achieve in education. This new emerging field, called Educational Data highest level of quality in higher education system is by Mining, concerns with developing methods that discover discovering knowledge for prediction regarding enrolment of knowledge from data originating from educational students in a particular course, alienation of traditional environments [3]. Educational Data Mining uses many classroom teaching model, detection of unfair means used in techniques such as Decision Trees, Neural Networks, Naïve online examination, detection of abnormal values in the result Bayes, K- Nearest neighbor, and many others. sheets of the students, prediction about students’ performance and so on. The knowledge is hidden among the educational data Using these techniques many kinds of knowledge can be set and it is extractable through data mining techniques. Present discovered such as association rules, classifications and paper is designed to justify the capabilities of data mining clustering. The discovered knowledge can be used for techniques in context of higher education by offering a data prediction regarding enrolment of students in a particular mining model for higher education system in the university. In course, alienation of traditional classroom teaching model, this research, the classification task is used to evaluate student’s detection of unfair means used in online examination, performance and as there are many approaches that are used for detection of abnormal values in the result sheets of the data classification, the decision tree method is used here. students, prediction about students‟ performance and so on. By this task we extract knowledge that describes students’ performance in end semester examination. It helps earlier in The main objective of this paper is to use data mining identifying the dropouts and students who need special attention methodologies to study students‟ performance in the courses. and allow the teacher to provide appropriate advising/counseling. Data mining provides many tasks that could be used to study the student performance. In this research, the classification task Keywords-Educational Data Mining (EDM); Classification; is used to evaluate student‟s performance and as there are many Knowledge Discovery in Database (KDD); ID3 Algorithm. approaches that are used for data classification, the decision I. INTRODUCTION tree method is used here. Information‟s like Attendance, Class test, Seminar and Assignment marks were collected from the The advent of information technology in various fields has student‟s management system, to predict the performance at the lead the large volumes of data storage in various formats like end of the semester. This paper investigates the accuracy of records, files, documents, images, sound, videos, scientific data Decision tree techniques for predicting student performance. and many new data formats. The data collected from different applications require proper method of extracting knowledge II. DATA MINING DEFINITION AND TECHNIQUES from large repositories for better decision making. Knowledge Data mining, also popularly known as Knowledge discovery in databases (KDD), often called data mining, aims Discovery in Database, refers to extracting or “mining" at the discovery of useful information from large collections of knowledge from large amounts of data. Data mining techniques data [1]. The main functions of data mining are applying are used to operate on large volumes of data to discover hidden various methods and algorithms in order to discover and extract patterns and relationships helpful in decision making. While patterns of stored data [2]. Data mining and knowledge data mining and knowledge discovery in database are discovery applications have got a rich focus due to its frequently treated as synonyms, data mining is actually part of significance in decision making and it has become an essential the knowledge discovery process. The sequences of steps component in various organizations. Data mining techniques identified in extracting knowledge from data are shown in have been introduced into new fields of Statistics, Databases, Figure 1. Machine Learning, Pattern Reorganization, Artificial Intelligence and Computation capabilities etc. 63 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, but it becomes costly so clustering can be used as Knowledge preprocessing approach for attribute subset selection and classification. C. Predication Regression technique can be adapted for predication. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. In data mining independent variables are attributes already known and response variables are what we want to predict. Unfortunately, many real-world problems are not simply prediction. Therefore, more complex techniques (e.g., logistic regression, decision trees, or neural nets) may be necessary to forecast future values. The same model types can often be used for both regression and classification. For example, the CART (Classification and Regression Trees) decision tree algorithm can be used to build both classification trees (to classify categorical response variables) and regression trees (to forecast continuous response variables). Neural networks too can create both classification and regression models. D. Association rule Association and correlation is usually to find frequent item set findings among large data sets. This type of finding helps businesses to make certain decisions, such as catalogue design, Figure 1: The steps of extracting knowledge from data cross marketing and customer shopping behavior analysis. Association Rule algorithms need to be able to generate rules Various algorithms and techniques like Classification, with confidence values less than one. However the number of Clustering, Regression, Artificial Intelligence, Neural possible Association Rules for a given dataset is generally very Networks, Association Rules, Decision Trees, Genetic large and a high proportion of the rules are usually of little (if Algorithm, Nearest Neighbor method etc., are used for any) value. knowledge discovery from databases. These techniques and methods in data mining need brief mention to have better E. Neural networks understanding. Neural network is a set of connected input/output units and A. Classification each connection has a weight present with it. During the learning phase, network learns by adjusting weights so as to be Classification is the most commonly applied data mining able to predict the correct class labels of the input tuples. technique, which employs a set of pre-classified examples to Neural networks have the remarkable ability to derive meaning develop a model that can classify the population of records at from complicated or imprecise data and can be used to extract large. This approach frequently employs decision tree or neural patterns and detect trends that are too complex to be noticed by network-based classification algorithms. The data classification either humans or other computer techniques. These are well process involves learning and classification. In Learning the suited for continuous valued inputs and outputs. Neural training data are analyzed by classification algorithm. In networks are best at identifying patterns or trends in data and classification test data are used to estimate the accuracy of the well suited for prediction or forecasting needs. classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples. The classifier-training F. Decision Trees algorithm uses these pre-classified examples to determine the Decision tree is tree-shaped structures that represent sets of set of parameters required for proper discrimination. The decisions. These decisions generate rules for the classification algorithm then encodes these parameters into a model called a of a dataset. Specific decision tree methods include classifier. Classification and Regression Trees (CART) and Chi Square B. Clustering Automatic Interaction Detection (CHAID). Clustering can be said as identification of similar classes of G. Nearest Neighbor Method objects. By using clustering techniques we can further identify A technique that classifies each record in a dataset based on dense and sparse regions in object space and can discover a combination of the classes of the k record(s) most similar to it overall distribution pattern and correlations among data in a historical dataset (where k is greater than or equal to 1). attributes. Classification approach can also be used for Sometimes called the k-nearest neighbor technique. effective means of distinguishing groups or classes of object 64 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Pandey and Pal [10] conducted study on the student III. RELATED WORK performance based by selecting 60 students from a degree Data mining in higher education is a recent research field college of Dr. R. M. L. Awadh University, Faizabad, India. By and this area of research is gaining popularity because of its means of association rule they find the interestingness of potentials to educational institutes. student in opting class teaching language. Data Mining can be used in educational field to enhance Ayesha, Mustafa, Sattar and Khan [11] describes the use of our understanding of learning process to focus on identifying, k-means clustering algorithm to predict student‟s learning extracting and evaluating variables related to the learning activities. The information generated after the implementation process of students as described by Alaa el-Halees [4]. Mining of data mining technique may be helpful for instructor as well in educational environment is called Educational Data Mining. as for students. Han and Kamber [3] describes data mining software that Bray [12], in his study on private tutoring and its allow the users to analyze data from different dimensions, implications, observed that the percentage of students receiving categorize it and summarize the relationships which are private tutoring in India was relatively higher than in Malaysia, identified during the mining process. Singapore, Japan, China and Sri Lanka. It was also observed Pandey and Pal [5] conducted study on the student that there was an enhancement of academic performance with performance based by selecting 600 students from different the intensity of private tutoring and this variation of intensity of colleges of Dr. R. M. L. Awadh University, Faizabad, India. By private tutoring depends on the collective factor namely socio- means of Bayes Classification on category, language and economic conditions. background qualification, it was found that whether new comer Bhardwaj and Pal [13] conducted study on the student students will performer or not. performance based by selecting 300 students from 5 different Hijazi and Naqvi [6] conducted as study on the student degree college conducting BCA (Bachelor of Computer performance by selecting a sample of 300 students (225 males, Application) course of Dr. R. M. L. Awadh University, 75 females) from a group of colleges affiliated to Punjab Faizabad, India. By means of Bayesian classification method university of Pakistan. The hypothesis that was stated as on 17 attribute, it was found that the factors like students‟ grade "Student's attitude towards attendance in class, hours spent in in senior secondary exam, living location, medium of teaching, study on daily basis after college, students' family income, mother‟s qualification, students other habit, family annual students' mother's age and mother's education are significantly income and student‟s family status were highly correlated with related with student performance" was framed. By means of the student academic performance. simple linear regression analysis, it was found that the factors IV. DATA MINING PROCESS like mother‟s education and student‟s family income were highly correlated with the student academic performance. In present day‟s educational system, a students‟ performance is determined by the internal assessment and end Khan [7] conducted a performance study on 400 students semester examination. The internal assessment is carried out by comprising 200 boys and 200 girls selected from the senior the teacher based upon students‟ performance in educational secondary school of Aligarh Muslim University, Aligarh, activities such as class test, seminar, assignments, general India with a main objective to establish the prognostic value of proficiency, attendance and lab work. The end semester different measures of cognition, personality and demographic examination is one that is scored by the student in semester variables for success at higher secondary level in science examination. Each student has to get minimum marks to pass a stream. The selection was based on cluster sampling technique semester in internal as well as end semester examination. in which the entire population of interest was divided into groups, or clusters, and a random sample of these clusters was A. Data Preparations selected for further analyses. It was found that girls with high The data set used in this study was obtained from VBS socio-economic status had relatively higher academic Purvanchal University, Jaunpur (Uttar Pradesh) on the achievement in science stream and boys with low socio- sampling method of computer Applications department of economic status had relatively higher academic achievement in course MCA (Master of Computer Applications) from session general. 2007 to 2010. Initially size of the data is 50. In this step data stored in different tables was joined in a single table after Galit [8] gave a case study that use students data to analyze their learning behavior to predict the results and to warn joining process errors were removed. students at risk before their final exams. B. Data selection and transformation Al-Radaideh, et al [9] applied a decision tree model to In this step only those fields were selected which were predict the final grade of students who studied the C++ course required for data mining. A few derived variables were in Yarmouk University, Jordan in the year 2005. Three selected. While some of the information for the variables was different classification methods namely ID3, C4.5, and the extracted from the database. All the predictor and response NaïveBayes were used. The outcome of their results indicated variables which were derived from the database are given in that Decision Tree model had better prediction than other Table I for reference. models. 65 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, TABLE I. STUDENT RELATED VARIABLES into three classes: Poor - <60%, Average - > 60% and <80%, Good - >80%. Variable Description Possible Values {First > 60%  LW – Lab Work. Lab work is divided into two classes: Second >45 & <60% Yes – student completed lab work, No – student not PSM Previous Semester Marks Third >36 & <45% completed lab work. Fail < 36%}  ESM - End semester Marks obtained in MCA semester {Poor , Average, Good} and it is declared as response variable. It is split into five CTG Class Test Grade class values: First – >60% , Second – >45% and <60%, SEM Seminar Performance {Poor , Average, Good} Third – >36% and < 45%, Fail < 40%. ASS Assignment {Yes, No} C. Decision Tree GP General Proficiency {Yes, No} A decision tree is a tree in which each branch node ATT Attendance {Poor , Average, Good} represents a choice between a number of alternatives, and each LW Lab Work {Yes, No} leaf node represents a decision. {First > 60% Decision tree are commonly used for gaining information for the purpose of decision -making. Decision tree starts with a Second >45 & <60% ESM End Semester Marks root node on which it is for users to take actions. From this Third >36 & <45% node, users split each node recursively according to decision Fail < 36%} tree learning algorithm. The final result is a decision tree in which each branch represents a possible scenario of decision and its outcome. The domain values for some of the variables were defined for the present investigation as follows: The three widely used decision tree learning algorithms are: ID3, ASSISTANT and C4.5.  PSM – Previous Semester Marks/Grade obtained in MCA course. It is split into five class values: First – >60%, D. The ID3 Decision Tree Second – >45% and <60%, Third – >36% and < 45%, ID3 is a simple decision tree learning algorithm developed Fail < 40%. by Ross Quinlan [14]. The basic idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy  CTG – Class test grade obtained. Here in each semester search through the given sets to test each attribute at every tree two class tests are conducted and average of two class test node. In order to select the attribute that is most useful for are used to calculate sessional marks. CTG is split into classifying a given sets, we introduce a metric - information three classes: Poor – < 40%, Average – > 40% and < gain. 60%, Good –>60%. To find an optimal way to classify a learning set, what we  SEM – Seminar Performance obtained. In each semester need to do is to minimize the questions asked (i.e. minimizing seminar are organized to check the performance of the depth of the tree). Thus, we need some function which can students. Seminar performance is evaluated into three measure which questions provide the most balanced splitting. classes: Poor – Presentation and communication skill is The information gain metric is such a function. low, Average – Either presentation is fine or E. Measuring Impurity Communication skill is fine, Good – Both presentation Given a data table that contains attributes and class of the and Communication skill is fine. attributes, we can measure homogeneity (or heterogeneity) of  ASS – Assignment performance. In each semester two the table based on the classes. We say a table is pure or assignments are given to students by each teacher. homogenous if it contains only a single class. If a data table Assignment performance is divided into two classes: Yes contains several classes, then we say that the table is impure or – student submitted assignment, No – Student not heterogeneous. There are several indices to measure degree of impurity quantitatively. Most well known indices to measure submitted assignment. degree of impurity are entropy, gini index, and classification  GP - General Proficiency performance. Like seminar, in error. each semester general proficiency tests are organized. General Proficiency test is divided into two classes: Yes – Entropy = - p log p  j 2 j student participated in general proficiency, No – Student not participated in general proficiency. Entropy of a pure table (consist of single class) is zero  ATT – Attendance of Student. Minimum 70% attendance because the probability is 1 and log (1) = 0. Entropy reaches maximum value when all classes in the table have equal is compulsory to participate in End Semester probability. Examination. But even through in special cases low attendance students also participate in End Semester Examination on genuine reason. Attendance is divided 66 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, in the tree are excluded, so that any given attribute can appear Gini Index = 1 p at most once along any path through the tree. This process continues for each new leaf node until either of two conditions is met: Gini index of a pure table consist of single class is zero because the probability is 1 and 1-1 = 0. Similar to Entropy, 1. Every attribute has already been included along this Gini index also reaches maximum value when all classes in the path through the tree, or table have equal probability. 2. The training examples associated with this leaf node all have the same target attribute value (i.e., their Classification Error =1 maxp  entropy is zero). Similar to Entropy and Gini Index, Classification error G. The ID3Algoritm index of a pure table (consist of single class) is zero because the probability is 1 and 1-max (1) = 0. The value of ID3 (Examples, Target_Attribute, Attributes) classification error index is always between 0 and 1. In fact the maximum Gini index for a given number of classes is always  Create a root node for the tree equal to the maximum of classification error index because for  If all examples are positive, Return the single-node tree Root, with label = +. a number of classes n, we set probability is equal to p  n  If all examples are negative, Return the single-node tree Root, with label = -. 1 1 and maximum Gini index happens at 1 n =1 , while  If number of predicting attributes is empty, then n n Return the single node tree Root, with label = most maximum classification error index also happens at common value of the target attribute in the examples. 1 1    Otherwise Begin 1 max  1 .   o A = The Attribute that best classifies n n   examples. F. Splitting Criteria o Decision Tree attribute for Root = A. o For each possible value, v , of A, To determine the best attribute for a particular node in the i  Add a new tree branch below Root, tree we use the measure called Information Gain. The corresponding to the test A = v . information gain, Gain (S, A) of an attribute A, relative to a collection of examples S, is defined as  Let Examples(v ) be the subset of examples that have the value v for | S | v A Gain(S, A) Entropy(S ) Entropy(S )  v  If Examples(v ) is empty | S | vValues( A)  Then below this new branch add a leaf node Where Values (A) is the set of all possible values for with label = most common attribute A, and S is the subset of S for which attribute A has target value in the value v (i.e., S = {s  S | A(s) = v}). The first term in the examples equation for Gain is just the entropy of the original collection S  Else below this new branch add the and the second term is the expected value of the entropy after S subtree ID3 (Examples(v ), is partitioned using attribute A. The expected entropy described Target_Attribute, Attributes – {A}) by this second term is simply the sum of the entropies of each  End | S | subset , weighted by the fraction of examples that  Return Root | S | V. RESULTS AND DISCUSSION belong to Gain (S, A) is therefore the expected reduction in entropy caused by knowing the value of attribute A. The data set of 50 students used in this study was obtained from VBS Purvanchal University, Jaunpur (Uttar Pradesh) | S | | S | i i Computer Applications department of course MCA (Master of Split Information (S, A)= log Computer Applications) from session 2007 to 2010. | S | | S | i1 TABLE II. DATA SET and S. No. PSM CTG SEM ASS GP ATT LW ESM Gain(S , A) 1. First Good Good Yes Yes Good Yes First Gain Ratio(S, A) = 2. First Good Average Yes No Good Yes First Split Information(S , A) 3. First Good Average No No Average No First 4. First Average Good No No Good Yes First The process of selecting a new attribute and partitioning the 5. First Average Average No Yes Good Yes First training examples is now repeated for each non terminal 6. First Poor Average No No Average Yes First descendant node. Attributes that have been incorporated higher 67 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, 7. First Poor Average No No Poor Yes Second information gain, Gain (S, A) of an attribute A, relative to a 8. First Average Poor Yes Yes Average No First collection of examples S, 9. First Poor Poor No No Poor No Third | S | 10. First Average Average Yes Yes Good No First First Gain(S , PSM )  Entropy(S ) Entropy(S ) First 11. Second Good Good Yes Yes Good Yes First | S | 12. Second Good Average Yes Yes Good Yes First | S | | S | S eco n d Th ird 13. Second Good Average Yes No Good No First  Entropy(S ) Entropy(S ) S eco n d Th ird | S | | S | 14. Second Average Good Yes Yes Good No First 15. Second Good Average Yes Yes Average Yes First | S | Fa il  Entropy(S ) 16. Second Good Average Yes Yes Poor Yes Second Fa il | S | 17. Second Average Average Yes Yes Good Yes Second 18. Second Average Average Yes Yes Poor Yes Second 19. Second Poor Average No Yes Good Yes Second TABLE III. GAIN VALUES 20. Second Average Poor Yes No Average Yes Second Gain Value 21. Second Poor Average No Yes Poor No Third Gain(S, PSM) 0.577036 22. Second Poor Poor Yes Yes Average Yes Third Gain(S, CTG) 0.515173 23. Second Poor Poor No No Average Yes Third Gain(S, SEM) 0.365881 24. Second Poor Poor Yes Yes Good Yes Second Gain(S, ASS) 0.218628 25. Second Poor Poor Yes Yes Poor Yes Third Gain (S, GP) 0.043936 26. Second Poor Poor No No Poor Yes Fail Gain(S, ATT) 0.451942 Gain(S, LW) 0.453513 27. Third Good Good Yes Yes Good Yes First 28. Third Average Good Yes Yes Good Yes Second 29. Third Good Average Yes Yes Good Yes Second PSM has the highest gain, therefore it is used as the root 30. Third Good Good Yes Yes Average Yes Second node as shown in figure 2. 31. Third Good Good No No Good Yes Second 32. Third Average Average Yes Yes Good Yes Second PSM 33. Third Average Average No Yes Average Yes Third 34. Third Average Good No No Good Yes Third 35. Third Good Average No Yes Average Yes Third 36. Third Average Poor No No Average Yes Third 37. Third Poor Average Yes No Average Yes Third Fail First Second Third 38. Third Poor Average No Yes Poor Yes Fail 39. Third Average Average No Yes Poor Yes Third Figure 2. PSM as root node 40. Third Poor Poor No No Good No Third 41. Third Poor Poor No Yes Poor Yes Fail Gain Ratio can be used for attribute selection, before 42. Third Poor Poor No No Poor No Fail calculating Gain ratio Split Information is shown in table IV. 43. Fail Good Good Yes Yes Good Yes Second 44. Fail Good Good Yes Yes Average Yes Second TABLE IV. SPLIT INFORMATION 45. Fail Average Good Yes Yes Average Yes Third Split Information Value 46. Fail Poor Poor Yes Yes Average No Fail Split(S, PSM) 1.386579 47. Fail Good Poor No Yes Poor Yes Fail Split (S, CTG) 1.448442 48. Fail Poor Poor No No Poor Yes Fail Split (S, SEM) 1.597734 49. Fail Average Average Yes Yes Good Yes Second Split (S, ASS) 1.744987 50. Fail Poor Good No No Poor No Fail Split (S, GP) 1.91968 Split (S, ATT) 1.511673 To work out the information gain for A relative to S, we Split (S, LW) 1.510102 first need to calculate the entropy of S. Here S is a set of 50 Gain Ratio is shown in table V. examples are 14 “First”, 15 “Second”, 13 “Third” and 8 “Fail”.. TABLE V. GAIN RATIO  p log ( p ) p log ( p ) First 2 First Second 2 Second Gain Ratio Value Entropy (S) = Gain Ratio (S, PSM) 0.416158  p log ( p ) p log ( p ) third 2 third Fail 2 Fail Gain Ratio (S, CTG) 0.355674 14 14 15 15         Gain Ratio (S, SEM) 0.229  log  log         2 2 Gain Ratio (S, ASS) 0.125289 50 50 50 50 =         Gain Ratio (S, GP) 0.022887 13 13 8 8         Gain Ratio (S, ATT) 0.298968  log  log         2 2 Gain Ratio (S, LW) 0.30032 50 50 50 50         This process goes on until all data classified perfectly or run = 1.964 out of attributes. The knowledge represented by decision tree To determine the best attribute for a particular node in the can be extracted and represented in the form of IF-THEN rules. tree we use the measure called Information Gain. The 68 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, [7] Z. N. Khan, “Scholastic achievement of higher secondary students in IF PSM = „First‟ AND ATT = „Good‟ AND CTG = „Good‟ or science stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87, „Average‟ THEN ESM = First 2005.. IF PSM = „First‟ AND CTG = „Good‟ AND ATT = “Good‟ [8] Galit.et.al, “Examining online learning processes based on log files OR „Average‟ THEN ESM = „First‟ analysis: a case study”. Research, Reflection and Innovations in Integrating ICT in Education 2007. IF PSM = „Second‟ AND ATT = „Good‟ AND ASS = „Yes‟ [9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI-Najjar, “Mining THEN ESM = „First‟ student data using decision trees”, International Arab Conference on IF PSM = „Second‟ AND CTG = „Average‟ AND LW = „Yes‟ Information Technology(ACIT'2006), Yarmouk University, Jordan, THEN ESM = „Second‟ IF PSM = „Third‟ AND CTG = „Good‟ OR „Average‟ AND [10] U. K. Pandey, and S. Pal, “A Data mining view on class room teaching language”, (IJCSI) International Journal of Computer Science Issue, ATT = “Good‟ OR „Average‟ THEN PSM = „Second‟ Vol. 8, Issue 2, pp. 277-282, ISSN:1694-0814, 2011. IF PSM = „Third‟ AND ASS = „No‟ AND ATT = „Average‟ [11] Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan, THEN PSM = „Third‟ “Data mining model for higher education system”, Europen Journal of IF PSM = „Fail‟ AND CTG = „Poor‟ AND ATT = „Poor‟ Scientific Research, Vol.43, No.1, pp.24-29, 2010. THEN PSM = „Fail‟ [12] M. Bray, The shadow education system: private tutoring and its implications for planners, (2nd ed.), UNESCO, PARIS, France, 2007. Figure 3. Rule Set generated by Decision Tree [13] B.K. Bharadwaj and S. Pal. “Data Mining: A prediction for performance improvement using classification”, International Journal of Computer One classification rules can be generated for each path from Science and Information Security (IJCSIS), Vol. 9, No. 4, pp. 136-140, each terminal node to root node. Pruning technique was executed by removing nodes with less than desired number of [14] J. R. Quinlan, “Introduction of decision tree: Machine learn”, 1: pp. 86- objects. IF- THEN rules may be easier to understand is shown 106, 1986. in figure 3. [15] Vashishta, S. (2011). Efficient Retrieval of Text for Biomedical Domain using Data Mining Algorithm. IJACSA - International Journal of CONCLUSION Advanced Computer Science and Applications, 2(4), 77-80. [16] Kumar, V. (2011). An Empirical Study of the Applications of Data In this paper, the classification task is used on student Mining Techniques in Higher Education. IJACSA - International database to predict the students division on the basis of Journal of Advanced Computer Science and Applications, 2(3), 80-84. previous database. As there are many approaches that are used Retrieved from http://ijacsa.thesai.org. for data classification, the decision tree method is used here. Information‟s like Attendance, Class test, Seminar and AUTHORS PROFILE Assignment marks were collected from the student‟s previous Brijesh Kumar Bhardwaj is Assistant Professor in the database, to predict the performance at the end of the semester. Department of Computer Applications, Dr. R. M. L. Avadh University Faizabad India. He obtained his M.C.A degree This study will help to the students and the teachers to from Dr. R. M. L. Avadh University Faizabad (2003) and improve the division of the student. This study will also work M.Phil. in Computer Applications from Vinayaka mission University, Tamilnadu. He is currently doing research in to identify those students which needed special attention to Data Mining and Knowledge Discovery. He has published reduce fail ration and taking appropriate action for the next one international paper. semester examination. Saurabh Pal received his M.Sc. (Computer Science) REFERENCES from Allahabad University, UP, India (1996) and obtained [1] Heikki, Mannila, Data mining: machine learning, statistics, and his Ph.D. degree from the Dr. R. M. L. Awadh University, databases, IEEE, 1996. Faizabad (2002). He then joined the Dept. of Computer Applications, VBS Purvanchal University, Jaunpur as [2] U. Fayadd, Piatesky, G. Shapiro, and P. Smyth, From data mining to Lecturer. At present, he is working as Head and Sr. Lecturer knowledge discovery in databases, AAAI Press / The MIT Press, at Department of Computer Applications. Massachusetts Institute Of Technology. ISBN 0–262 56097–6, 1996. [3] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Saurabh Pal has authored a commendable number of research papers in Morgan Kaufmann, 2000. international/national Conference/journals and also guides research scholars in Computer Science/Applications. He is an active member of IACSIT, CSI, [4] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A Society of Statistics and Computer Applications and working as Case Study”, 2009.. Reviewer/Editorial Board Member for more than 15 international journals. His [5] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or research interests include Image Processing, Data Mining, Grid Computing and underperformer using classification”, (IJCSIT) International Journal of Artificial Intelligence. Computer Science and Information Technology, Vol. 2(2), pp.686-690, ISSN:0975-9646, 2011. [6] S. T. Hijazi, and R. S. M. M. Naqvi, “Factors affecting student‟s performance: A Case of Private Colleges”, Bangladesh e-Journal of Sociology, Vol. 3, No. 1, 2006. 69 | P a g e www.ijacsa.thesai.org http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Advanced Computer Science and Applications Unpaywall

Mining Educational Data to Analyze Students Performance

International Journal of Advanced Computer Science and ApplicationsJan 1, 2011

Loading next page...
 
/lp/unpaywall/mining-educational-data-to-analyze-students-performance-aI06aMtuhQ

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Unpaywall
ISSN
2156-5570
DOI
10.14569/ijacsa.2011.020609
Publisher site
See Article on Publisher Site

Abstract

(IJACSA) International Journal of Advanced Computer Science and Applications, Mining Educational Data to Analyze Students‟ Performance Brijesh Kumar Baradwaj Saurabh Pal Research Scholor, Sr. Lecturer, Dept. of MCA, Singhaniya University, VBS Purvanchal University, Rajasthan, India Jaunpur-222001, India Abstract— The main objective of higher education institutions is There are increasing research interests in using data mining to provide quality education to its students. One way to achieve in education. This new emerging field, called Educational Data highest level of quality in higher education system is by Mining, concerns with developing methods that discover discovering knowledge for prediction regarding enrolment of knowledge from data originating from educational students in a particular course, alienation of traditional environments [3]. Educational Data Mining uses many classroom teaching model, detection of unfair means used in techniques such as Decision Trees, Neural Networks, Naïve online examination, detection of abnormal values in the result Bayes, K- Nearest neighbor, and many others. sheets of the students, prediction about students’ performance and so on. The knowledge is hidden among the educational data Using these techniques many kinds of knowledge can be set and it is extractable through data mining techniques. Present discovered such as association rules, classifications and paper is designed to justify the capabilities of data mining clustering. The discovered knowledge can be used for techniques in context of higher education by offering a data prediction regarding enrolment of students in a particular mining model for higher education system in the university. In course, alienation of traditional classroom teaching model, this research, the classification task is used to evaluate student’s detection of unfair means used in online examination, performance and as there are many approaches that are used for detection of abnormal values in the result sheets of the data classification, the decision tree method is used here. students, prediction about students‟ performance and so on. By this task we extract knowledge that describes students’ performance in end semester examination. It helps earlier in The main objective of this paper is to use data mining identifying the dropouts and students who need special attention methodologies to study students‟ performance in the courses. and allow the teacher to provide appropriate advising/counseling. Data mining provides many tasks that could be used to study the student performance. In this research, the classification task Keywords-Educational Data Mining (EDM); Classification; is used to evaluate student‟s performance and as there are many Knowledge Discovery in Database (KDD); ID3 Algorithm. approaches that are used for data classification, the decision I. INTRODUCTION tree method is used here. Information‟s like Attendance, Class test, Seminar and Assignment marks were collected from the The advent of information technology in various fields has student‟s management system, to predict the performance at the lead the large volumes of data storage in various formats like end of the semester. This paper investigates the accuracy of records, files, documents, images, sound, videos, scientific data Decision tree techniques for predicting student performance. and many new data formats. The data collected from different applications require proper method of extracting knowledge II. DATA MINING DEFINITION AND TECHNIQUES from large repositories for better decision making. Knowledge Data mining, also popularly known as Knowledge discovery in databases (KDD), often called data mining, aims Discovery in Database, refers to extracting or “mining" at the discovery of useful information from large collections of knowledge from large amounts of data. Data mining techniques data [1]. The main functions of data mining are applying are used to operate on large volumes of data to discover hidden various methods and algorithms in order to discover and extract patterns and relationships helpful in decision making. While patterns of stored data [2]. Data mining and knowledge data mining and knowledge discovery in database are discovery applications have got a rich focus due to its frequently treated as synonyms, data mining is actually part of significance in decision making and it has become an essential the knowledge discovery process. The sequences of steps component in various organizations. Data mining techniques identified in extracting knowledge from data are shown in have been introduced into new fields of Statistics, Databases, Figure 1. Machine Learning, Pattern Reorganization, Artificial Intelligence and Computation capabilities etc. 63 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, but it becomes costly so clustering can be used as Knowledge preprocessing approach for attribute subset selection and classification. C. Predication Regression technique can be adapted for predication. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. In data mining independent variables are attributes already known and response variables are what we want to predict. Unfortunately, many real-world problems are not simply prediction. Therefore, more complex techniques (e.g., logistic regression, decision trees, or neural nets) may be necessary to forecast future values. The same model types can often be used for both regression and classification. For example, the CART (Classification and Regression Trees) decision tree algorithm can be used to build both classification trees (to classify categorical response variables) and regression trees (to forecast continuous response variables). Neural networks too can create both classification and regression models. D. Association rule Association and correlation is usually to find frequent item set findings among large data sets. This type of finding helps businesses to make certain decisions, such as catalogue design, Figure 1: The steps of extracting knowledge from data cross marketing and customer shopping behavior analysis. Association Rule algorithms need to be able to generate rules Various algorithms and techniques like Classification, with confidence values less than one. However the number of Clustering, Regression, Artificial Intelligence, Neural possible Association Rules for a given dataset is generally very Networks, Association Rules, Decision Trees, Genetic large and a high proportion of the rules are usually of little (if Algorithm, Nearest Neighbor method etc., are used for any) value. knowledge discovery from databases. These techniques and methods in data mining need brief mention to have better E. Neural networks understanding. Neural network is a set of connected input/output units and A. Classification each connection has a weight present with it. During the learning phase, network learns by adjusting weights so as to be Classification is the most commonly applied data mining able to predict the correct class labels of the input tuples. technique, which employs a set of pre-classified examples to Neural networks have the remarkable ability to derive meaning develop a model that can classify the population of records at from complicated or imprecise data and can be used to extract large. This approach frequently employs decision tree or neural patterns and detect trends that are too complex to be noticed by network-based classification algorithms. The data classification either humans or other computer techniques. These are well process involves learning and classification. In Learning the suited for continuous valued inputs and outputs. Neural training data are analyzed by classification algorithm. In networks are best at identifying patterns or trends in data and classification test data are used to estimate the accuracy of the well suited for prediction or forecasting needs. classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples. The classifier-training F. Decision Trees algorithm uses these pre-classified examples to determine the Decision tree is tree-shaped structures that represent sets of set of parameters required for proper discrimination. The decisions. These decisions generate rules for the classification algorithm then encodes these parameters into a model called a of a dataset. Specific decision tree methods include classifier. Classification and Regression Trees (CART) and Chi Square B. Clustering Automatic Interaction Detection (CHAID). Clustering can be said as identification of similar classes of G. Nearest Neighbor Method objects. By using clustering techniques we can further identify A technique that classifies each record in a dataset based on dense and sparse regions in object space and can discover a combination of the classes of the k record(s) most similar to it overall distribution pattern and correlations among data in a historical dataset (where k is greater than or equal to 1). attributes. Classification approach can also be used for Sometimes called the k-nearest neighbor technique. effective means of distinguishing groups or classes of object 64 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Pandey and Pal [10] conducted study on the student III. RELATED WORK performance based by selecting 60 students from a degree Data mining in higher education is a recent research field college of Dr. R. M. L. Awadh University, Faizabad, India. By and this area of research is gaining popularity because of its means of association rule they find the interestingness of potentials to educational institutes. student in opting class teaching language. Data Mining can be used in educational field to enhance Ayesha, Mustafa, Sattar and Khan [11] describes the use of our understanding of learning process to focus on identifying, k-means clustering algorithm to predict student‟s learning extracting and evaluating variables related to the learning activities. The information generated after the implementation process of students as described by Alaa el-Halees [4]. Mining of data mining technique may be helpful for instructor as well in educational environment is called Educational Data Mining. as for students. Han and Kamber [3] describes data mining software that Bray [12], in his study on private tutoring and its allow the users to analyze data from different dimensions, implications, observed that the percentage of students receiving categorize it and summarize the relationships which are private tutoring in India was relatively higher than in Malaysia, identified during the mining process. Singapore, Japan, China and Sri Lanka. It was also observed Pandey and Pal [5] conducted study on the student that there was an enhancement of academic performance with performance based by selecting 600 students from different the intensity of private tutoring and this variation of intensity of colleges of Dr. R. M. L. Awadh University, Faizabad, India. By private tutoring depends on the collective factor namely socio- means of Bayes Classification on category, language and economic conditions. background qualification, it was found that whether new comer Bhardwaj and Pal [13] conducted study on the student students will performer or not. performance based by selecting 300 students from 5 different Hijazi and Naqvi [6] conducted as study on the student degree college conducting BCA (Bachelor of Computer performance by selecting a sample of 300 students (225 males, Application) course of Dr. R. M. L. Awadh University, 75 females) from a group of colleges affiliated to Punjab Faizabad, India. By means of Bayesian classification method university of Pakistan. The hypothesis that was stated as on 17 attribute, it was found that the factors like students‟ grade "Student's attitude towards attendance in class, hours spent in in senior secondary exam, living location, medium of teaching, study on daily basis after college, students' family income, mother‟s qualification, students other habit, family annual students' mother's age and mother's education are significantly income and student‟s family status were highly correlated with related with student performance" was framed. By means of the student academic performance. simple linear regression analysis, it was found that the factors IV. DATA MINING PROCESS like mother‟s education and student‟s family income were highly correlated with the student academic performance. In present day‟s educational system, a students‟ performance is determined by the internal assessment and end Khan [7] conducted a performance study on 400 students semester examination. The internal assessment is carried out by comprising 200 boys and 200 girls selected from the senior the teacher based upon students‟ performance in educational secondary school of Aligarh Muslim University, Aligarh, activities such as class test, seminar, assignments, general India with a main objective to establish the prognostic value of proficiency, attendance and lab work. The end semester different measures of cognition, personality and demographic examination is one that is scored by the student in semester variables for success at higher secondary level in science examination. Each student has to get minimum marks to pass a stream. The selection was based on cluster sampling technique semester in internal as well as end semester examination. in which the entire population of interest was divided into groups, or clusters, and a random sample of these clusters was A. Data Preparations selected for further analyses. It was found that girls with high The data set used in this study was obtained from VBS socio-economic status had relatively higher academic Purvanchal University, Jaunpur (Uttar Pradesh) on the achievement in science stream and boys with low socio- sampling method of computer Applications department of economic status had relatively higher academic achievement in course MCA (Master of Computer Applications) from session general. 2007 to 2010. Initially size of the data is 50. In this step data stored in different tables was joined in a single table after Galit [8] gave a case study that use students data to analyze their learning behavior to predict the results and to warn joining process errors were removed. students at risk before their final exams. B. Data selection and transformation Al-Radaideh, et al [9] applied a decision tree model to In this step only those fields were selected which were predict the final grade of students who studied the C++ course required for data mining. A few derived variables were in Yarmouk University, Jordan in the year 2005. Three selected. While some of the information for the variables was different classification methods namely ID3, C4.5, and the extracted from the database. All the predictor and response NaïveBayes were used. The outcome of their results indicated variables which were derived from the database are given in that Decision Tree model had better prediction than other Table I for reference. models. 65 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, TABLE I. STUDENT RELATED VARIABLES into three classes: Poor - <60%, Average - > 60% and <80%, Good - >80%. Variable Description Possible Values {First > 60%  LW – Lab Work. Lab work is divided into two classes: Second >45 & <60% Yes – student completed lab work, No – student not PSM Previous Semester Marks Third >36 & <45% completed lab work. Fail < 36%}  ESM - End semester Marks obtained in MCA semester {Poor , Average, Good} and it is declared as response variable. It is split into five CTG Class Test Grade class values: First – >60% , Second – >45% and <60%, SEM Seminar Performance {Poor , Average, Good} Third – >36% and < 45%, Fail < 40%. ASS Assignment {Yes, No} C. Decision Tree GP General Proficiency {Yes, No} A decision tree is a tree in which each branch node ATT Attendance {Poor , Average, Good} represents a choice between a number of alternatives, and each LW Lab Work {Yes, No} leaf node represents a decision. {First > 60% Decision tree are commonly used for gaining information for the purpose of decision -making. Decision tree starts with a Second >45 & <60% ESM End Semester Marks root node on which it is for users to take actions. From this Third >36 & <45% node, users split each node recursively according to decision Fail < 36%} tree learning algorithm. The final result is a decision tree in which each branch represents a possible scenario of decision and its outcome. The domain values for some of the variables were defined for the present investigation as follows: The three widely used decision tree learning algorithms are: ID3, ASSISTANT and C4.5.  PSM – Previous Semester Marks/Grade obtained in MCA course. It is split into five class values: First – >60%, D. The ID3 Decision Tree Second – >45% and <60%, Third – >36% and < 45%, ID3 is a simple decision tree learning algorithm developed Fail < 40%. by Ross Quinlan [14]. The basic idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy  CTG – Class test grade obtained. Here in each semester search through the given sets to test each attribute at every tree two class tests are conducted and average of two class test node. In order to select the attribute that is most useful for are used to calculate sessional marks. CTG is split into classifying a given sets, we introduce a metric - information three classes: Poor – < 40%, Average – > 40% and < gain. 60%, Good –>60%. To find an optimal way to classify a learning set, what we  SEM – Seminar Performance obtained. In each semester need to do is to minimize the questions asked (i.e. minimizing seminar are organized to check the performance of the depth of the tree). Thus, we need some function which can students. Seminar performance is evaluated into three measure which questions provide the most balanced splitting. classes: Poor – Presentation and communication skill is The information gain metric is such a function. low, Average – Either presentation is fine or E. Measuring Impurity Communication skill is fine, Good – Both presentation Given a data table that contains attributes and class of the and Communication skill is fine. attributes, we can measure homogeneity (or heterogeneity) of  ASS – Assignment performance. In each semester two the table based on the classes. We say a table is pure or assignments are given to students by each teacher. homogenous if it contains only a single class. If a data table Assignment performance is divided into two classes: Yes contains several classes, then we say that the table is impure or – student submitted assignment, No – Student not heterogeneous. There are several indices to measure degree of impurity quantitatively. Most well known indices to measure submitted assignment. degree of impurity are entropy, gini index, and classification  GP - General Proficiency performance. Like seminar, in error. each semester general proficiency tests are organized. General Proficiency test is divided into two classes: Yes – Entropy = - p log p  j 2 j student participated in general proficiency, No – Student not participated in general proficiency. Entropy of a pure table (consist of single class) is zero  ATT – Attendance of Student. Minimum 70% attendance because the probability is 1 and log (1) = 0. Entropy reaches maximum value when all classes in the table have equal is compulsory to participate in End Semester probability. Examination. But even through in special cases low attendance students also participate in End Semester Examination on genuine reason. Attendance is divided 66 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, in the tree are excluded, so that any given attribute can appear Gini Index = 1 p at most once along any path through the tree. This process continues for each new leaf node until either of two conditions is met: Gini index of a pure table consist of single class is zero because the probability is 1 and 1-1 = 0. Similar to Entropy, 1. Every attribute has already been included along this Gini index also reaches maximum value when all classes in the path through the tree, or table have equal probability. 2. The training examples associated with this leaf node all have the same target attribute value (i.e., their Classification Error =1 maxp  entropy is zero). Similar to Entropy and Gini Index, Classification error G. The ID3Algoritm index of a pure table (consist of single class) is zero because the probability is 1 and 1-max (1) = 0. The value of ID3 (Examples, Target_Attribute, Attributes) classification error index is always between 0 and 1. In fact the maximum Gini index for a given number of classes is always  Create a root node for the tree equal to the maximum of classification error index because for  If all examples are positive, Return the single-node tree Root, with label = +. a number of classes n, we set probability is equal to p  n  If all examples are negative, Return the single-node tree Root, with label = -. 1 1 and maximum Gini index happens at 1 n =1 , while  If number of predicting attributes is empty, then n n Return the single node tree Root, with label = most maximum classification error index also happens at common value of the target attribute in the examples. 1 1    Otherwise Begin 1 max  1 .   o A = The Attribute that best classifies n n   examples. F. Splitting Criteria o Decision Tree attribute for Root = A. o For each possible value, v , of A, To determine the best attribute for a particular node in the i  Add a new tree branch below Root, tree we use the measure called Information Gain. The corresponding to the test A = v . information gain, Gain (S, A) of an attribute A, relative to a collection of examples S, is defined as  Let Examples(v ) be the subset of examples that have the value v for | S | v A Gain(S, A) Entropy(S ) Entropy(S )  v  If Examples(v ) is empty | S | vValues( A)  Then below this new branch add a leaf node Where Values (A) is the set of all possible values for with label = most common attribute A, and S is the subset of S for which attribute A has target value in the value v (i.e., S = {s  S | A(s) = v}). The first term in the examples equation for Gain is just the entropy of the original collection S  Else below this new branch add the and the second term is the expected value of the entropy after S subtree ID3 (Examples(v ), is partitioned using attribute A. The expected entropy described Target_Attribute, Attributes – {A}) by this second term is simply the sum of the entropies of each  End | S | subset , weighted by the fraction of examples that  Return Root | S | V. RESULTS AND DISCUSSION belong to Gain (S, A) is therefore the expected reduction in entropy caused by knowing the value of attribute A. The data set of 50 students used in this study was obtained from VBS Purvanchal University, Jaunpur (Uttar Pradesh) | S | | S | i i Computer Applications department of course MCA (Master of Split Information (S, A)= log Computer Applications) from session 2007 to 2010. | S | | S | i1 TABLE II. DATA SET and S. No. PSM CTG SEM ASS GP ATT LW ESM Gain(S , A) 1. First Good Good Yes Yes Good Yes First Gain Ratio(S, A) = 2. First Good Average Yes No Good Yes First Split Information(S , A) 3. First Good Average No No Average No First 4. First Average Good No No Good Yes First The process of selecting a new attribute and partitioning the 5. First Average Average No Yes Good Yes First training examples is now repeated for each non terminal 6. First Poor Average No No Average Yes First descendant node. Attributes that have been incorporated higher 67 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, 7. First Poor Average No No Poor Yes Second information gain, Gain (S, A) of an attribute A, relative to a 8. First Average Poor Yes Yes Average No First collection of examples S, 9. First Poor Poor No No Poor No Third | S | 10. First Average Average Yes Yes Good No First First Gain(S , PSM )  Entropy(S ) Entropy(S ) First 11. Second Good Good Yes Yes Good Yes First | S | 12. Second Good Average Yes Yes Good Yes First | S | | S | S eco n d Th ird 13. Second Good Average Yes No Good No First  Entropy(S ) Entropy(S ) S eco n d Th ird | S | | S | 14. Second Average Good Yes Yes Good No First 15. Second Good Average Yes Yes Average Yes First | S | Fa il  Entropy(S ) 16. Second Good Average Yes Yes Poor Yes Second Fa il | S | 17. Second Average Average Yes Yes Good Yes Second 18. Second Average Average Yes Yes Poor Yes Second 19. Second Poor Average No Yes Good Yes Second TABLE III. GAIN VALUES 20. Second Average Poor Yes No Average Yes Second Gain Value 21. Second Poor Average No Yes Poor No Third Gain(S, PSM) 0.577036 22. Second Poor Poor Yes Yes Average Yes Third Gain(S, CTG) 0.515173 23. Second Poor Poor No No Average Yes Third Gain(S, SEM) 0.365881 24. Second Poor Poor Yes Yes Good Yes Second Gain(S, ASS) 0.218628 25. Second Poor Poor Yes Yes Poor Yes Third Gain (S, GP) 0.043936 26. Second Poor Poor No No Poor Yes Fail Gain(S, ATT) 0.451942 Gain(S, LW) 0.453513 27. Third Good Good Yes Yes Good Yes First 28. Third Average Good Yes Yes Good Yes Second 29. Third Good Average Yes Yes Good Yes Second PSM has the highest gain, therefore it is used as the root 30. Third Good Good Yes Yes Average Yes Second node as shown in figure 2. 31. Third Good Good No No Good Yes Second 32. Third Average Average Yes Yes Good Yes Second PSM 33. Third Average Average No Yes Average Yes Third 34. Third Average Good No No Good Yes Third 35. Third Good Average No Yes Average Yes Third 36. Third Average Poor No No Average Yes Third 37. Third Poor Average Yes No Average Yes Third Fail First Second Third 38. Third Poor Average No Yes Poor Yes Fail 39. Third Average Average No Yes Poor Yes Third Figure 2. PSM as root node 40. Third Poor Poor No No Good No Third 41. Third Poor Poor No Yes Poor Yes Fail Gain Ratio can be used for attribute selection, before 42. Third Poor Poor No No Poor No Fail calculating Gain ratio Split Information is shown in table IV. 43. Fail Good Good Yes Yes Good Yes Second 44. Fail Good Good Yes Yes Average Yes Second TABLE IV. SPLIT INFORMATION 45. Fail Average Good Yes Yes Average Yes Third Split Information Value 46. Fail Poor Poor Yes Yes Average No Fail Split(S, PSM) 1.386579 47. Fail Good Poor No Yes Poor Yes Fail Split (S, CTG) 1.448442 48. Fail Poor Poor No No Poor Yes Fail Split (S, SEM) 1.597734 49. Fail Average Average Yes Yes Good Yes Second Split (S, ASS) 1.744987 50. Fail Poor Good No No Poor No Fail Split (S, GP) 1.91968 Split (S, ATT) 1.511673 To work out the information gain for A relative to S, we Split (S, LW) 1.510102 first need to calculate the entropy of S. Here S is a set of 50 Gain Ratio is shown in table V. examples are 14 “First”, 15 “Second”, 13 “Third” and 8 “Fail”.. TABLE V. GAIN RATIO  p log ( p ) p log ( p ) First 2 First Second 2 Second Gain Ratio Value Entropy (S) = Gain Ratio (S, PSM) 0.416158  p log ( p ) p log ( p ) third 2 third Fail 2 Fail Gain Ratio (S, CTG) 0.355674 14 14 15 15         Gain Ratio (S, SEM) 0.229  log  log         2 2 Gain Ratio (S, ASS) 0.125289 50 50 50 50 =         Gain Ratio (S, GP) 0.022887 13 13 8 8         Gain Ratio (S, ATT) 0.298968  log  log         2 2 Gain Ratio (S, LW) 0.30032 50 50 50 50         This process goes on until all data classified perfectly or run = 1.964 out of attributes. The knowledge represented by decision tree To determine the best attribute for a particular node in the can be extracted and represented in the form of IF-THEN rules. tree we use the measure called Information Gain. The 68 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, [7] Z. N. Khan, “Scholastic achievement of higher secondary students in IF PSM = „First‟ AND ATT = „Good‟ AND CTG = „Good‟ or science stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87, „Average‟ THEN ESM = First 2005.. IF PSM = „First‟ AND CTG = „Good‟ AND ATT = “Good‟ [8] Galit.et.al, “Examining online learning processes based on log files OR „Average‟ THEN ESM = „First‟ analysis: a case study”. Research, Reflection and Innovations in Integrating ICT in Education 2007. IF PSM = „Second‟ AND ATT = „Good‟ AND ASS = „Yes‟ [9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI-Najjar, “Mining THEN ESM = „First‟ student data using decision trees”, International Arab Conference on IF PSM = „Second‟ AND CTG = „Average‟ AND LW = „Yes‟ Information Technology(ACIT'2006), Yarmouk University, Jordan, THEN ESM = „Second‟ IF PSM = „Third‟ AND CTG = „Good‟ OR „Average‟ AND [10] U. K. Pandey, and S. Pal, “A Data mining view on class room teaching language”, (IJCSI) International Journal of Computer Science Issue, ATT = “Good‟ OR „Average‟ THEN PSM = „Second‟ Vol. 8, Issue 2, pp. 277-282, ISSN:1694-0814, 2011. IF PSM = „Third‟ AND ASS = „No‟ AND ATT = „Average‟ [11] Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan, THEN PSM = „Third‟ “Data mining model for higher education system”, Europen Journal of IF PSM = „Fail‟ AND CTG = „Poor‟ AND ATT = „Poor‟ Scientific Research, Vol.43, No.1, pp.24-29, 2010. THEN PSM = „Fail‟ [12] M. Bray, The shadow education system: private tutoring and its implications for planners, (2nd ed.), UNESCO, PARIS, France, 2007. Figure 3. Rule Set generated by Decision Tree [13] B.K. Bharadwaj and S. Pal. “Data Mining: A prediction for performance improvement using classification”, International Journal of Computer One classification rules can be generated for each path from Science and Information Security (IJCSIS), Vol. 9, No. 4, pp. 136-140, each terminal node to root node. Pruning technique was executed by removing nodes with less than desired number of [14] J. R. Quinlan, “Introduction of decision tree: Machine learn”, 1: pp. 86- objects. IF- THEN rules may be easier to understand is shown 106, 1986. in figure 3. [15] Vashishta, S. (2011). Efficient Retrieval of Text for Biomedical Domain using Data Mining Algorithm. IJACSA - International Journal of CONCLUSION Advanced Computer Science and Applications, 2(4), 77-80. [16] Kumar, V. (2011). An Empirical Study of the Applications of Data In this paper, the classification task is used on student Mining Techniques in Higher Education. IJACSA - International database to predict the students division on the basis of Journal of Advanced Computer Science and Applications, 2(3), 80-84. previous database. As there are many approaches that are used Retrieved from http://ijacsa.thesai.org. for data classification, the decision tree method is used here. Information‟s like Attendance, Class test, Seminar and AUTHORS PROFILE Assignment marks were collected from the student‟s previous Brijesh Kumar Bhardwaj is Assistant Professor in the database, to predict the performance at the end of the semester. Department of Computer Applications, Dr. R. M. L. Avadh University Faizabad India. He obtained his M.C.A degree This study will help to the students and the teachers to from Dr. R. M. L. Avadh University Faizabad (2003) and improve the division of the student. This study will also work M.Phil. in Computer Applications from Vinayaka mission University, Tamilnadu. He is currently doing research in to identify those students which needed special attention to Data Mining and Knowledge Discovery. He has published reduce fail ration and taking appropriate action for the next one international paper. semester examination. Saurabh Pal received his M.Sc. (Computer Science) REFERENCES from Allahabad University, UP, India (1996) and obtained [1] Heikki, Mannila, Data mining: machine learning, statistics, and his Ph.D. degree from the Dr. R. M. L. Awadh University, databases, IEEE, 1996. Faizabad (2002). He then joined the Dept. of Computer Applications, VBS Purvanchal University, Jaunpur as [2] U. Fayadd, Piatesky, G. Shapiro, and P. Smyth, From data mining to Lecturer. At present, he is working as Head and Sr. Lecturer knowledge discovery in databases, AAAI Press / The MIT Press, at Department of Computer Applications. Massachusetts Institute Of Technology. ISBN 0–262 56097–6, 1996. [3] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Saurabh Pal has authored a commendable number of research papers in Morgan Kaufmann, 2000. international/national Conference/journals and also guides research scholars in Computer Science/Applications. He is an active member of IACSIT, CSI, [4] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A Society of Statistics and Computer Applications and working as Case Study”, 2009.. Reviewer/Editorial Board Member for more than 15 international journals. His [5] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or research interests include Image Processing, Data Mining, Grid Computing and underperformer using classification”, (IJCSIT) International Journal of Artificial Intelligence. Computer Science and Information Technology, Vol. 2(2), pp.686-690, ISSN:0975-9646, 2011. [6] S. T. Hijazi, and R. S. M. M. Naqvi, “Factors affecting student‟s performance: A Case of Private Colleges”, Bangladesh e-Journal of Sociology, Vol. 3, No. 1, 2006. 69 | P a g e www.ijacsa.thesai.org

Journal

International Journal of Advanced Computer Science and ApplicationsUnpaywall

Published: Jan 1, 2011

There are no references for this article.