Abstract
Fuzzy Inf. Eng. (2009)2:149-159 DOI 10.1007/s12543-009-0012-2 ORIGINAL ARTICLE A Survey of Fuzzy Decision Tree Classiﬁer Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li Received: 30 December 2008/ Revised: 26 April 2009/ Accepted: 10 May 2009/ © Springer and Fuzzy Information and Engineering Branch of the Operations Research Society of China Abstract Decision-tree algorithm provides one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibil- ity. The most comprehensible decision trees have been designed for perfect symbolic data. Over the years, additional methodologies have been investigated and proposed to deal with continuous or multi-valued data, and with missing or noisy features. Re- cently, with the growing popularity of fuzzy representation, some researchers have proposed to utilize fuzzy representation in decision trees to deal with similar situ- ations. This paper presents a survey of current methods for Fuzzy Decision Tree (FDT) designment and the various existing issues. After considering potential ad- vantages of FDT classiﬁers over traditional decision tree classiﬁers, we discuss the subjects of FDT including attribute selection criteria, inference for decision assign- ment and stopping criteria. To be best of our knowledge, this is the ﬁrst overview of fuzzy decision tree classiﬁer. Keywords Fuzzy decision tree · Classiﬁer · Attribute selection · Decision assign- ment · Stopping criteria 1. Introduction Decision trees are one of the most popular methods for learning and reasoning from feature-based examples. They have undergone a number of alternations to deal with language and measurement uncertainties. Fuzzy decision trees (FDT) is one of such extensions, it aims at combining symbolic decision trees with approximate reasoning oﬀered by fuzzy representation. The intent is to exploit complementary advantages Yi-lai Chen · Tao Wang () · Ben-sheng Wang Nanjing Army Command College, Nanjing 210045, P.R.China e-mail: InsistStar@nudt.edu.cn Zhou-jun Li School of Computer Science & Engineering, Beihang University, Beijing 100083, P.R.China 150 Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li (2009) of both: popularity in applications to learn from examples and high knowledge com- prehensibility of decision trees, ability to deal with inexact and uncertain information of fuzzy representation [2]. In the past, there are roughly a dozen publications in this ﬁeld. Ichihashi et al. extract fuzzy reasoning rules viewed as fuzzy partitions [6]. An algebraic method to facilitate incremental learning is also employed. Xizhao and Hong discretize contin- uous attributes using fuzzy numbers and possibility theory [7]. On the other hand, Pedrycz and Sosnowski employ context-based fuzzy clustering for this purpose [8]. Yuan and Shaw induce a fuzzy decision tree by reducing classiﬁcation ambiguity with fuzzy evidence [9]. The input data are fuzziﬁed using triangular membership functions around cluster centers obtained using Kohonen’s feature map [24]. Wang et al. present optimization principles of fuzzy decision trees based on minimizing the total number and average depth of leaves, proving that the algorithmic complexity of constructing a minimum tree is NP-hard. Fuzzy entropy and classiﬁcation ambi- guity are minimized at node level, and fuzzy clustering is used to merge branches [10]. Wang et al. propose an incremental fuzzy decision tree method to mine data streams, which uses soft discretization to improve noise data handling and utilize a new technique named TBST for inserting new example and calculating best split-test point eﬃciently [36]. The organization of the left paper is as follows: section 2 contains the preliminar- ies, deﬁnitions, and terminologies needed for later sections; section 3 explains the motivations behind FDT and their potential use and drawbacks; section 4 addresses the problems of attribute selection criteria, inference for decision assignment and stopping criteria. Summary and conclusions are provided in section 5. 2. Preliminaries We brieﬂy describe some necessary terminology for describing fuzzy decision trees. • The set of fuzzy variables is denoted by V = {V , V ,··· , V }. 1 2 n • For each variable V ∈ V – Crisp example data is u ∈ U . – D denotes the set of fuzzy terms. i Income – V denotes the fuzzy tem p for the variable V (e.g.,V , as necessary Low to stress the variable or with anonymous values-otherwise p alone may be used). • The set of fuzzy terms for the decision variable is denoted by D . i n • The set of training examples is E = {e |e = (u ,··· , u , y )}, where y is the j j j j j j crisp classiﬁcation. Conﬁdence weights of the training examples are denoted byW = {w }, where w is the weight for e ∈ E. j j j • For each node N of the fuzzy decision tree – F denotes the set of fuzzy restrictions on the path leading to N. Fuzzy Inf. Eng. (2009) 2:149-159 151 – V is the set of attributes appearing on the path leading to N i N – N : V = {V|∃p([V is V ] ∈ F )}. i i N N – x = {x } is the set of memberships in N for all the training examples. – N|V denotes the particular child of node N created by using V to split N and following the edge V ∈ D . N N – S denotes the set of N‘s children when V ∈ (V − V ) is used for the N i i N N i N split. Note that S = {(N|V )|V ∈ D , D = {V ∈ D|∃(e ∈ E)(x > i j p p p V i i j i i 0∧μ (u ) > 0))};in other words, there are no nodes containing no train- v j ing examples, and thus some linguistic terms may not be used to create subtrees. N c – P denotes the example count for decision V ∈ D in node N.It is impor- k k tant to note that unless the sets are such that the sum of all memberships N|v for any u is 1, P P ; that is, the membership sum from all v ∈D k p i k children of N can diﬀer from that of N; this is due to fuzzy sets; the total membership can either increase or decrease while building the tree. N N – P and I denote the total example count and information measure for node N. N N S S N N V V i i – G = I − I denotes the information gain when using V in N (I is the weighted information content). • α denotes the area; ς denotes the centroid of a fuzzy set. • I(x ) denotes the entropy of the class distribution w.r.t. the fuzzy example set N N x in node N. I(x |A ) is the weighted sum of entropies from all child nodes, if A is used as the test attribute in node N. N N N • Gain(x , A ) = I(x )−I(x |A ) is the information gain w.r.t. attribute A , which i i i is the ﬁrst of the two attribute selection measures we consider. • SplitI(x , A ) denotes the split information—the entropy w.r.t. the value distri- bution of attribute A (instead of the class distribution). N N N • GainR(x , A ) = Gain(x , A )SplitI(x , A ) is the information gain ratio w.r.t. i i i attribute A , which is the second attribute selection measure we consider. 3. Potentials and Problems of Fuzzy Decision Tree Classiﬁers Decision trees are one of the most popular choices for learning and reasoning from feature-based examples. Fuzzy decision tree aims at combining symbolic decision trees with approximate reasoning oﬀered by fuzzy representation. It is attractive for the following reasons [14]: • An apparent advantage of fuzzy decision trees is that they use the same routines as symbolic decision trees (but with fuzzy representation). This allows for uti- lization of the same comprehensible tree structure for knowledge understand- ing and veriﬁcation. This also allows more robust processing with continuously 152 Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li (2009) gradual outputs. Moreover, one may easily incorporate rich methodologies for dealing with missing features and incomplete trees. For example, suppose that the inference descends to a node, which does not have a branch (maybe due to tree pruning, which often improves generalization properties) for the corre- sponding feature of the sample. This dandling feature can be fuzziﬃed and then its match to fuzzy restrictions associated with the available branches provides better than uniform discernibility among those children. • Fuzzy decision trees can process data expressed with symbolic, numerical val- ues (more information) and fuzzy terms. Because fuzzy restrictions are eval- uated using fuzzy membership functions, this process provides a linkage be- tween continuous domain values and abstract features. • Fuzzy sets and approximate reasoning allow for processing of noisy, inconsis- tent and incomplete data. It is more accurate than standard decision trees. The possible drawbacks of FDT, on the other hand, are: • From a computational point of view, the method is intrinsically slower than crisp tree induction. This is the price paid for having a more accurate but still interpretable classiﬁer. • FDT does not seem to oﬀer fundamentally new concepts or induction principles for the design of learning algorithms, e.g., to the ideas of resampling and en- semble learning (like bagging and boosting) or the idea of margin maximization underlying kernel-based learning methods. Hullermeier doubts that FDT will be very conductive to generalization performance and model accuracy [30]. 4. Special Issues of Fuzzy Decision Tree Classiﬁers Like classical decision trees with the ID3 algorithm, fuzzy decision trees are con- structed in a top-down manner by recursive partitioning of the training set into sub- nets. Here, we just list some special issues of fuzzy decision trees: • Attribute selection criteria in fuzzy decision trees A standard method to select a test attribute in classical decision tree is to choose the attribute that yields the highest information gain. When it goes to fuzzy de- cision trees, it’ll lead to some problem, some modiﬁcations and enhancements of the basic algorithm have been proposed and studied. • Inference for decision assignment The inference procedure is an important part of FDT, and it is diﬀerent from traditional decision tree. • Stopping criteria Usually classical tree learning is terminated if all attributes are already used on the current path; or if all examples in the current node belong to the same class. In FDT, an example may occur in any node with any membership degree. Fuzzy Inf. Eng. (2009) 2:149-159 153 Thus general more examples are considered per node and fuzzy trees are usu- ally larger than classical trees. To solve this problem, there are many methods mentioned. 4.1 Attribute Selection Criteria in Fuzzy Decision Trees One important aspect of tree induction is the choice of feature at each stage of con- struction. If weak features are selected, the resulting decision tree will be meaningless and will exhibit poor performance. Several methods have been proposed for attribute selection criteria in fuzzy decision trees. They can be categorized to ﬁve kinds. 4.1.1 Attribute Selection Criteria Based on Information Gain Tests performed on data (corresponding to selecting attributes to be tested in tree nodes) are decided based on some criteria. The most commonly used is information gain, which is computationally simple and shown eﬀective: select an attribute for testing (or a new threshold on a continuous domain) such that the information diﬀer- ence between that contained in a given node and in its children nodes is maximized. N N P P |D | N C k k The information content is measured according to I = − ( · log ), where N N k=1 P P |E| |D | N N N c N P = f 2(x ,μ (y )), p = P . V j k j=1 j k k=1 k 4.1.2 Attribute Selection Criteria Based on Gain Ratio Gain ratio is used in classical decision trees to select the test attribute in order to re- duce the natural bias of information gain, i.e., the fact that it favors attributes with many values (which may lead to a model of low predictive power). In FDT, fuzzy partitions are created for all attributes before the tree induction. To keep the tree sim- ple, usually each partition possesses as few fuzzy sets as possible. The information N N N gain ratio is measured according to: GainR(x , A ) = Gain(x , A )SplitI(x , A ). i i i 4.1.3 Attribute Selection Criteria Based on Extended Information Measure As mentioned in [16], negative information gain (ratio) can occur in FDT, so they sug- gest a diﬀerent way of computing information measure in FDT to make information gain ratio applicable as a selection measure. To determine best test attribute, [16] create a fuzzy contingency table (see Table 1) for each candidate A in node N, from which the information measure for attribute A can be computed. They use 9 steps to calculate gain ratio of an attribute, and point out it will not be negative. The detail is in [16]. Table 1: A fuzzy contingency table N(A) C C A 1 2 Sum N|a N|a 1 1 N|a a Z Z Z C C 1 2 N|a N|a 2 2 N|a a Z Z Z C C 1 2 N N N C Z Z Z Sum C C 1 2 4.1.4 Attribute Selection Criteria Based on Yuan & Shaw’s Measure 154 Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li (2009) A method is introduced to construct FDT by means of a measure of classiﬁcation ambiguity as a measure of discrimination [9]. This measure is deﬁned from both a measure of fuzzy subset hood and a measure of non-speciﬁcity. The measure of ambiguity H (C|A ) was used to measure the discriminating power of attribute A Y j j M(V ) with regard to C : H (C|A ) = w(V )· G (V ), where w(V ) = , M(V ) = Y j l Y l l j l M(V ) t=1 t=1 μ (x) and the details of G (V )’s calculation can be found in [9]. V Y l x∈X 4.1.5 Attribute Selection Criteria Based on Fuzzy-rough Sets It has been shown that fuzzy-rough metric is a useful gauger of (discrete and real- valued) attributes information content in datasets. This has been employed primarily within the feature selection task, to beneﬁt the rule induction that follows this process [22]. Fuzzy-rough measure is comparable with the leading measures of feature impor- tance. Its behavior is quite similar to the information gain and gain ratio metrics. The results show that the fuzzy-rough measure performs comparably to fuzzy ID3 for fuzzy datasets, and better than it for crisp data [22]. 4.2 Inference for Decision Assignment The most profound diﬀerences between FDT and DT are in the process of classifying a new sample. These diﬀerences arise from the fact that • FDT have leaves that are more likely to contain samples of diﬀerent classes (with diﬀerent degrees of match). • The inference procedure is likely to match the new sample against multiple leaves, with varying degrees of match. To account for these potential problems, a number of inference routines have been proposed. Some inferences follow the idea of classifying a new sample directly from the built FDT; others generate rules ﬁrst, and then utilize these rules to classify new samples. 4.2.1 Classifying a New Sample Depending on FDT To decide the classiﬁcation assigned to a sample using FDT, we have to ﬁnd leaves whose restrictions are satisﬁed by the sample, and combine their decisions into a sin- gle crisp response. Such decisions are very likely to have conﬂicts, found both in a single leaf (non-unique classiﬁcation of its examples) and across diﬀerent leaves (diﬀerent satisﬁed leaves have diﬀerent examples, possibly with conﬂicting classiﬁ- cations). To deﬁne such a decision procedure, [2] deﬁne four operators: g , g , g , g . Let 0 1 2 3 L be the set of leaves of FDT and l ∈ L be a leaf. First, a sample e is computed i l using the satisfaction of each restriction [V is V ] ∈ F (using g ), and all results are i 0 combined with g to determine to what degree the combined restrictions are satisﬁed. Then g propagates this satisfaction to determine the level of satisfaction of that leaf. A given leaf by itself may contain inconsistent information if it contains examples of diﬀerent classes. Diﬀerent satisﬁed leaves must also have their decisions combined. Fuzzy Inf. Eng. (2009) 2:149-159 155 Because of this 2-level disjunctive representation, the choice of g is distributed into these two levels. 4.2.2 Classifying a New Sample Depending on Rule Base The most frequent application of FDT is the induction or the adaptation of rule-based models. A plenty of methods has been developed for inducing a fuzzy rule base from the data given. Among them, the main diﬀerence concerns the way in which individual rules or their condition parts are learned [14,16,21,32,34]. One possibility is to iden- tify regions in the input space that seem to be qualiﬁed to form the condition part of a rule. This can be done by looking for clusters using clustering algorithms, or by identifying hyperboxes in the manners of so-called covering (separate and conquer) algorithms. By projecting the regions thus obtained onto the various dimensions of the input space, rule antecedents of the form X ∈ A are obtained, where X is an indi- vidual attribute and A is a fuzzy set (the projection of the fuzzy region). The condition part of the rule is then given by the conjunction of these antecedents. This approach is relatively ﬂexible, though it suﬀers from the disadvantage that each rule makes use of its own fuzzy sets. Thus, the complete rule base might be diﬃcult to interpret [30]. An alternative is to proceed from a ﬁxed fuzzy partition for each attribute, i.e., a regular “fuzzy grid” of the input space, and to consider each cell of this grid as a potential antecedent part of a rule. This approach is advantageous from an in- terpretability point of view. On the other hand, it is less ﬂexible and may produce inaccurate models when the one-dimensional partitions deﬁne a multi-dimensional grid that does not reﬂect the structure of the data [30]. Recently, an important improvement in the ﬁeld of fuzzy rule learning is hybrid methods that combine FDT with other methodologies, notably evolutionary algo- rithms and neural networks [6,13,22,31,32]. For example, evolutionary algorithms are often used in order to optimize a fuzzy rule base or for searching the space of po- tential rule bases in a systematic way. Quite interesting are also neuro-fuzzy methods [28]. For example, one idea is to encode a fuzzy system as a neural network and to apply standard methods (like back propagation) in order train such a network. This way, neuro-fuzzy systems combine the representational advantages of fuzzy systems with the ﬂexibility and adaptivity of neural networks. 4.3 Stopping Criteria One of the challenges in fuzzy decision tree induction is to develop algorithms that produce fuzzy decision trees of small size and depth. In part, smaller fuzzy decision trees lead to lesser computational expense in determining the class of a test instance. More signiﬁcantly, however, larger fuzzy decision trees lead to poorer generaliza- tion performance. Motivated by these considerations, a large number of algorithms have been proposed toward producing smaller decision trees. Broadly these may be classiﬁed into three categories [13]. • The ﬁrst category includes those eﬀorts that are based on diﬀerent criteria to split the instances at each node. Some examples of the diﬀerent node splitting criteria include entropy or its variants, the chi-square statistic, the G statistic, 156 Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li (2009) and the GINI index of diversity. Despite these eﬀorts, there appears to be no single node splitting that performs the best in all cases; nonetheless there is little doubt that random splitting performs the worst. And for FDT, usually adds a condition, namely whether the information measure is below a speciﬁed threshold. The threshold deﬁned here enables a user to control the tree growth, so that unnecessary nodes are not added. • The second category is based on pruning a decision tree either during the con- struction of the tree or after the tree has been constructed. In either case, the idea is to remove branches with little statistical validity. • The third category of eﬀorts toward producing smaller fuzzy decision trees is motivated by the fact that a locally optimum decision at a node may give rise to the possibility of instances at the node being split along branches, such that instances along some or all of the branches require a large number of addi- tional nodes for classiﬁcation. The so-called look-ahead methods attempts to establish a decision at a node by analyzing the classiﬁability of instances along each of the branches of a split. Surprisingly, mixed results (ranging from look- ahead makes no diﬀerence to look-ahead produces larger trees) are reported in the literature when look-ahead is used. 4.4 Incremental FDT for Data Stream Mining More recently, the data mining community has focused on a new model of data pro- cessing, in which data arrives in the form of continuous streams. The key issue in mining on data streams is that only one pass is allowed over the entire data. More- over, there is a real-time constraint, i.e. the processing time is limited by the rate of arrival of instances in the data stream, and the memory and disk available to store any summary information may be bounded. For most data mining problems, a one-pass algorithm cannot be very accurate. The existing algorithms typically achieve either a deterministic bound on the accuracy or a probabilistic bound. Peng et al. propose the soft discretization method in traditional data mining ﬁeld, wit solve the problem of noise data and improve the classiﬁcation accuracy [25]. Soft discretization could be viewed as an extension of hard discretization, and the classical information measures deﬁned in the probability domain have been extended to new deﬁnitions in the possibility domain based on fuzzy set theory [13]. A crisp set A is expressed with a sharp characterization functionA (a): Ω→{0, 1} : α ∈ Ω, c c alternatively a fuzzy set A is characterized with a membership functionA(a): Ω → [0, 1] : a ∈ Ω. The membership A(a) is called the possibility of A to take a value a ∈ Ω [35]. The probability of fuzzy set A is deﬁned, according to Zadeh [15], by P (A) = A(a)d , where d is a probability measure on Ω , and the subscript F F p p is used to denote the associated fuzzy terms. Specially, if A is deﬁned on discrete domainΩ= {a ,··· , a,··· , a }, and the probability of P(a ) = p then its probability 1 i m i i is P (A) = A(a )p . F i i i=1 Let Q = {A ,··· , A } be a family of fuzzy sets on Ω. Q is called a fuzzy partition 1 k ofΩ when A (a) = 1,∀a ∈ Ω [16]. r=1 Fuzzy Inf. Eng. (2009) 2:149-159 157 A hard discretization is deﬁned with a threshold T , which generates the bound- ary between two crisp sets. Alternatively, a soft discretization is deﬁned by a fuzzy set pair which forms a fuzzy partition. In contrast to the classical method of non- overlapping partitioning, the soft discretization is overlapped. The soft discretization is deﬁned with three parameters/functions, one is the cross point T , the other two are the membership functions of the fuzzy set pair A and A : A (a) + A (a) = 1. 1 2 1 2 The cross point T , i.e. the localization of soft discretization, is determined based on whether it can maximize the information gain in classiﬁcation, and the member- ship functions of the fuzzy set pair are determined according to the characteristics of attribute data, such as the uncertainty of the associated attribute. On top of VFDT and VFDTc, improve the soft discretization method, Wang et al. propose a system fVFDT [36]. Focusing on continuous attribute, they have developed and evaluated a new technique named TBST to insert new example and calculate best split-test point eﬃciently. It builds threaded binary search trees, and its processing time for values insertion is O(nlogn). Comparing to the method used in VFDTc, it improves from O(nlogn) to O(n) in processing time for best split-test point calcu- lating. As for noise data, they improve the soft discretization method in traditional data mining ﬁeld, so the fVFDT can deal with noise data eﬃciently and improve the classiﬁcation accuracy. 4. Conclusions FDT combines fuzzy theory with classical decision trees in order to learn a classiﬁ- cation model, which is able to handle vagueness and also comprehensible. In the past, diﬀerent authors introduced several variants of fuzzy decision trees. Boyen and Wenkel [1] presented the automatic induction of binary fuzzy trees based on a new class of discrimination quality measures. Janikow adapted the well-known ID3 algorithm so that it works with fuzzy sets [2]. To be best of our knowledge, in spite of these papers in this ﬁeld, there isn’t a survey on this ﬁeld. The aim of this paper is to give a global overview of FDT. We categories the main research issues of FDT into three parts: attribute selection criteria, inference procedure and stopping criteria. We also introduce incremental fuzzy decision tree issues for data stream mining. After several years of intensive research of FDT classiﬁcation has reached a some- what mature state, and a lot of quite sophisticated algorithms are now available. A signiﬁcant improvement of the current quality level can hardly be expected. In the future, some other research issues of FDT should be paid more attention, such as incremental fuzzy decision trees and FDT on data streams mining [35,36]. Acknowledgments Thanks to the support by National Natural Science Foundation of China (No. 60573057, 60473057 and 90604007). References 1. Boyen XP , Wenkel L (1995) Fuzzy decision tree induction for power system security assessment. IFAC Symposium on control of power plants and power systems. Mexico 158 Yi-lai Chen · Tao Wang · Ben-sheng Wang · Zhou-jun Li (2009) 2. Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Transactions on Systems, Man and Cybernetics 28(1):1-14. IEEE Press, Piscataway, NJ, USA 3. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufman, San Mateo, CA, USA 4. Safavian SR, Landgrebe D (1991) A survey of decision tree classiﬁer methodology. IEEE Transac- tions on Systems, Man and Cybernetics 21(3):660-674 5. Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets and System 138:221-254 6. Ichihashi H , Shirai T, Nagasaka K, Miyoshi T (1999) Neuro fuzzy ID3: A method of inducing fuzzy decision trees with linear programming for maximizing entropy and algebraic methods. Fuzzy Sets and System 81(1):157-167 7. Xizhao W ,Hong J (1998) On the handling of fuzziness for continuous valued attributes in decision tree generation. Fuzzy Sets and System 99:283-290 8. Pedrycz W, Sosnowski A (2000) Designing decision trees with the use of fuzzy granulation. IEEE Transactions Systems, Man and Cybernetics 30:151-159 9. Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets and System 69:125-139 10. Wang X, Chen B, Qian G, and Ye F (2000) On the optimization of fuzzy decision trees. Fuzzy Sets and System 112:117-125 11. Chiang IJ , Hsu JYJ (1996) Integration of fuzzy classiﬁers with decision trees. in Proc. Asian Fuzzy System. Sym :266-271 12. Hayashi I, Maeda T, Bastian A, Jain LC (1998) Generation of fuzzy decision trees by fuzzy ID3 with adjusting mechanism of and/or operators. in Proc. Int. Conf. Fuzzy System: 681-685 13. Dong M, Kothari R (2001) Look-ahead based fuzzy decision tree induction. IEEE Transactions on Systems 9(3) 14. Janikow CZ (1996) Exemplar learning in fuzzy decision trees. Proceedings of FUZZY-IEEE:1500- 15. Bouchon Meunier B, Marsala C (2003) Measures of discrimination for the construction of fuzzy decision trees. In Proc. of the FIP’03 conference, Beijing, China:709-714 16. Wang X, Borgelt C (2004) Information measures in fuzzy decision trees. Proc. 13th IEEE Interna- tional Conference on Fuzzy Systems 1:85-90 17. Jensen R, Shen Q (2004) Semantics-Preserving dimensionality reduction: Rough and fuzzy-rough based approaches. IEEE Transactions on Knowledge and Data Engineering 16(12):1457-1471 18. Dubois D, Prade H (1980) Fuzzy sets and systems: Theory and applications. Academic Press, New York 19. Guetova M, Holldobler ¨ S, Storr ¨ H (2002) Incremental fuzzy decision trees. 25th German Conference on Artiﬁcial Intelligence 20. Zeidler J, Schlossor M (1996) Continuous-Valued attributes in fuzzy decision trees. IPMU 96:395- 21. Janikow CZ , Faifer M (1999) Fuzzy partitioning with FID3.1. Proc. of the 18th International Con- ference of the North American Fuzzy Information Processing Society. New York 22. Jensen R , Shen Q (2005) Fuzzy-rough feature signiﬁcance for fuzzy decision trees. UKCI 23. Pedrycz W, Sosnowski ZA (2005) c-fuzzy decision trees. IEEE Transactions Systems, Man and Cybernetics, Part C.No.4:498-511 24. Kohonen T (1989) Self-organization and associative memory. Berlin, Germany: Springer-Verlag 25. Peng Y , Flach P (2001) Soft discretization to enhance the continuous decision tree induction. Aspects of Data Mining, Decision Support and Meta-Learning, Christophe Giraud-Carrier, Nada Lavrac and Steve Moyle, editors,109-118. ECML/PKDD’01 workshop notes 26. Boyen X, Wehenkel L (1999) Automatic induction of fuzzy decision trees and its application to power system security assessment. Fuzzy Sets and Systems 102:3-19 27. Marsala C, Bouchon Meunier B (2003) Choice of a method for the construction of fuzzy decision trees. Proc. 12th IEEE Int. Conf. on Fuzzy Systems, St. Louis, MI, USA 28. Nauck D, Klawonn F , and Kruse R (1997) Foundations of neuro-fuzzy systems. Wiley and Sons, Chichester, UK 29. Kruse R, Nauck D , and Borgelt C (1999) Data mining with fuzzy methods: status and perspectives. Fuzzy Inf. Eng. (2009) 2:149-159 159 EUFIT’99 30. Hullermeier E (2005) Fuzzy methods in machine learning and data mining: status and prospects. Sets and Systems 156(3):387-407 31. Olaru C, Wehenkel L (2000) On neurofuzzy and fuzzy decision tree approaches. Uncertainty and Fusion Eds.: B. Bouchon-Meunier, R. R. Yager, L. A. Zadeh, Kluwer Academic Publishers:131-145 32. Sushmita Mitra, Kishori MK, Sankar KP (2002) Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Transactions on Systems, Man and Cybernetics 32(4) 33. Quilan JR (1986) Induction on decision trees. Machine learning 1:81-106 34. Hong TP , Lee CY (1998) Learning fuzzy knowledge from training examples. Conference on Infor- mation and Knowledge Management Proceedings of the seventh international conference on Infor- mation and knowledge management, Bethesda, Maryland, United States Pages:161-166 35. Ben David, Johannes Gehrke and Daniel Kifer (2004) Detecting change in data streams. In Proceed- ings of VLDB:1134-1145 36. Wang T , Li ZJ, Yan YJ, Chen HW (2008) An incremental fuzzy decision tree classiﬁcation method for mining data streams. In Proceedings of MLDM:91-103
Journal
Fuzzy Information and Engineering
– Taylor & Francis
Published: Jun 1, 2009
Keywords: Fuzzy decision tree; Classifier; Attribute selection; Decision assignment; Stopping criteria