On the classiﬁcation performance of TAN and general Bayesian networks
Michael G. Madden
College of Engineering and Informatics, National University of Ireland, University Road, Galway, Ireland
Available online 9 January 2009
Over a decade ago, Friedman et al. introduced the Tree Augmented Naïve Bayes (TAN) classiﬁer, with
experiments indicating that it signiﬁcantly outperformed Naïve Bayes (NB) in terms of classiﬁcation
accuracy, whereas general Bayesian network (GBN) classiﬁers performed no better than NB. This paper
challenges those claims, using a careful experimental analysis to show that GBN classiﬁers signiﬁcantly
outperform NB on datasets analyzed, and are comparable to TAN performance. It is found that the poor
performance reported by Friedman et al. are not attributable to the GBN per se, but rather to their use of
simple empirical frequencies to estimate GBN parameters, whereas basic parameter smoothing (used in
their TAN analyses but not their GBN analyses) improves GBN performance signiﬁcantly. It is concluded
that, while GBN classiﬁers may have some limitations, they deserve greater attention, particularly in
domains where insight into classiﬁcation decisions, as well as good accuracy, is required.
Ó 2009 Elsevier B.V. All rights reserved.
This paper examines the performance of Bayesian networks as
classiﬁers, comparing their performance to that of the Naïve Bayes
(NB) classiﬁer and the Tree Augmented Naïve Bayes (TAN) classi-
ﬁer, both of which make strong assumptions about interactions be-
tween domain variables.
In the experiments performed for this work, described below in
Section 3, standard Bayesian networks (referred to as General
Bayesian Networks, GBNs, to distinguish them from NB and TAN)
are compared with NB and TAN classiﬁers on 28 standard bench-
mark datasets. Our experiments indicate that the GBN classiﬁer is
substantially better than NB, with performance closer to that of
TAN. This contrasts with the conclusions drawn in the landmark pa-
per on Bayesian network classiﬁers by Friedman et al. . That pa-
per presented results on many of the same datasets, showing that
GBNs constructed using the minimum description length (MDL)
score tend to perform no better than NB. That result has been
widely noted by other authors (e.g. [16,18]); in one case the result
was interpreted as indicating that NB ‘‘easily outperforms” GBN.
Our contention is that it has become ‘accepted wisdom’ that
GBN classiﬁcation performance is no better than that of NB, and
signiﬁcantly worse than TAN (ignoring other considerations such
as computational complexity or interpretability). Our results indi-
cate that GBN’s classiﬁcation performance is superior to that of NB
and much closer to that of TAN, when the same parameter estima-
tion procedure is used for all.
It turns out that Friedman et al. used simple frequency counts
for parameter estimation in constructing GBN classiﬁers, whereas
they used parameter smoothing in constructing TAN classiﬁers
(see Section 2.3 for details). Our experiments show that if fre-
quency counts are used for both GBN and TAN, neither is much bet-
ter than NB (Section 3.3, Fig. 5), but if parameter smoothing is used
for both, they both perform similarly well (Fig. 4). Furthermore,
since GBN classiﬁers are commonly constructed through heuristic
search, it is possible for improved GBN construction algorithms
to lead to improved performance.
The structure of the paper is as follows. Section 2 reviews
Bayesian networks and the algorithms for constructing GBN and
TAN classiﬁers that are used in this paper. Section 3 presents
experiments applying NB, TAN and two GBN algorithms to classiﬁ-
cation problems on 28 standard datasets, and identiﬁes why the
results of this paper are at odds with those of Friedman et al. as
mentioned above. Finally, Section 4 draws general conclusions
about the suitability of GBNs as classiﬁers.
2. Bayesian networks and classiﬁcation
As is well known, a Bayesian network is composed of the net-
work structure and its conditional probabilities. The structure B
is a directed acyclic graph where the nodes correspond to domain
and the arcs between nodes represent direct
dependencies between the variables. Likewise, the absence of an
arc between two nodes x
represents that x
given its parents in B
. Using the notation of Cooper and Her-
skovits , the set of parents of a node x
structure is annotated with a set of conditional probabilities, B
0950-7051/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.
* Tel.: +35391493797; fax: +35391444214.
E-mail address: email@example.com.
Knowledge-Based Systems 22 (2009) 489–495
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/knosys