On the classification performance of TAN and general Bayesian networks
Michael G. Madden
*
College of Engineering and Informatics, National University of Ireland, University Road, Galway, Ireland
article info
Article history:
Available online 9 January 2009
Keywords:
Bayesian networks
TAN
Naïve Bayes
Classification
Inductive learning
Parameter estimation
abstract
Over a decade ago, Friedman et al. introduced the Tree Augmented Naïve Bayes (TAN) classifier, with
experiments indicating that it significantly outperformed Naïve Bayes (NB) in terms of classification
accuracy, whereas general Bayesian network (GBN) classifiers performed no better than NB. This paper
challenges those claims, using a careful experimental analysis to show that GBN classifiers significantly
outperform NB on datasets analyzed, and are comparable to TAN performance. It is found that the poor
performance reported by Friedman et al. are not attributable to the GBN per se, but rather to their use of
simple empirical frequencies to estimate GBN parameters, whereas basic parameter smoothing (used in
their TAN analyses but not their GBN analyses) improves GBN performance significantly. It is concluded
that, while GBN classifiers may have some limitations, they deserve greater attention, particularly in
domains where insight into classification decisions, as well as good accuracy, is required.
Ó 2009 Elsevier B.V. All rights reserved.
1. Introduction
This paper examines the performance of Bayesian networks as
classifiers, comparing their performance to that of the Naïve Bayes
(NB) classifier and the Tree Augmented Naïve Bayes (TAN) classi-
fier, both of which make strong assumptions about interactions be-
tween domain variables.
In the experiments performed for this work, described below in
Section 3, standard Bayesian networks (referred to as General
Bayesian Networks, GBNs, to distinguish them from NB and TAN)
are compared with NB and TAN classifiers on 28 standard bench-
mark datasets. Our experiments indicate that the GBN classifier is
substantially better than NB, with performance closer to that of
TAN. This contrasts with the conclusions drawn in the landmark pa-
per on Bayesian network classifiers by Friedman et al. [14]. That pa-
per presented results on many of the same datasets, showing that
GBNs constructed using the minimum description length (MDL)
score tend to perform no better than NB. That result has been
widely noted by other authors (e.g. [16,18]); in one case the result
was interpreted as indicating that NB ‘‘easily outperforms” GBN.
Our contention is that it has become ‘accepted wisdom’ that
GBN classification performance is no better than that of NB, and
significantly worse than TAN (ignoring other considerations such
as computational complexity or interpretability). Our results indi-
cate that GBN’s classification performance is superior to that of NB
and much closer to that of TAN, when the same parameter estima-
tion procedure is used for all.
It turns out that Friedman et al. used simple frequency counts
for parameter estimation in constructing GBN classifiers, whereas
they used parameter smoothing in constructing TAN classifiers
(see Section 2.3 for details). Our experiments show that if fre-
quency counts are used for both GBN and TAN, neither is much bet-
ter than NB (Section 3.3, Fig. 5), but if parameter smoothing is used
for both, they both perform similarly well (Fig. 4). Furthermore,
since GBN classifiers are commonly constructed through heuristic
search, it is possible for improved GBN construction algorithms
to lead to improved performance.
The structure of the paper is as follows. Section 2 reviews
Bayesian networks and the algorithms for constructing GBN and
TAN classifiers that are used in this paper. Section 3 presents
experiments applying NB, TAN and two GBN algorithms to classifi-
cation problems on 28 standard datasets, and identifies why the
results of this paper are at odds with those of Friedman et al. as
mentioned above. Finally, Section 4 draws general conclusions
about the suitability of GBNs as classifiers.
2. Bayesian networks and classification
As is well known, a Bayesian network is composed of the net-
work structure and its conditional probabilities. The structure B
S
is a directed acyclic graph where the nodes correspond to domain
variables x
1
, ...,x
n
and the arcs between nodes represent direct
dependencies between the variables. Likewise, the absence of an
arc between two nodes x
1
and x
2
represents that x
2
is independent
of x
1
given its parents in B
S
. Using the notation of Cooper and Her-
skovits [12], the set of parents of a node x
i
in B
S
is denoted
p
i
. The
structure is annotated with a set of conditional probabilities, B
P
,
0950-7051/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.knosys.2008.10.006
* Tel.: +35391493797; fax: +35391444214.
E-mail address: michael.madden@nuigalway.ie.
Knowledge-Based Systems 22 (2009) 489–495
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys