The VLDB Journal (2006)
Dan Pelleg · Andrew Moore
Dependency trees in sub-linear time and bounded memory
Received: 23 February 2005 / Accepted: 21 April 2005 / Published online: 2 February 2006
Abstract We focus on the problem of efﬁcient learning of
dependency trees. Once grown, they can be used as a spe-
cial case of a Bayesian network, for PDF approximation,
and for many other uses. Given the data, a well-known al-
gorithm can ﬁt an optimal tree in time that is quadratic in
the number of attributes and linear in the number of records.
We show how to modify it to exploit partial knowledge about
edge weights. Experimental results show running time that
is near-constant in the number of records, without signiﬁcant
loss in accuracy of the generated trees.
Keywords Data mining · Probably approximately correct
learning · Fast algorithms · Dependency trees
Bayesian networks are a popular class of very general mod-
els. They are widely used for data modeling, for inference,
and for PDF approximation. They are also appealing from
the cognitive aspect as their structure can often be visual-
ized and easily understood. However, because of their ex-
pressiveness, they are hard to ﬁt from data, requiring search
in a super-exponential space of possible graph structures.
Despite recent advances [9, 10], learning network structure
from big data sets demands huge computational resources.
Our approach restricts the search space to a more tract-
able one by considering only a simpler sub-class of graphical
models. Speciﬁcally, we focus on trees. For trees, the well-
known  algorithm can ﬁnd optimal solutions in polyno-
mial time. As an added feature, the trees can be described
Work done at Carnegie-Mellon university. This research was spon-
sored by the National Science Foundation (NSF) under grant no. ACI-
0121671 and no. DMS-9873442.
D. Pelleg (
IBM Haifa Labs, Haifa, Israel
Robotics Institute, Carnegie-Mellon University,
more simply to human users. Below, we show how to mod-
ify the known algorithm so it runs in time that is sub-linear
in the input size, using a user-speciﬁed amount of memory.
Empirical evidence shows run time which is linear in the
number of attributes, and constant in the input size. The con-
stant depends only on intrinsic properties of the data. This
allows processing of very large data sets. We also examine
and quantify the possible loss of accuracy and show it is neg-
ligible for most practical purposes.
More precisely, dependency trees are belief networks
that satisfy the additional constraint that each node has at
most one parent. It has been shown  that ﬁnding the tree
that maximizes the data likelihood can be performed as fol-
lows. First, construct a full graph where each node corre-
sponds to an attribute in the input data. Next, assign edge
weights; these are derived from the mutual information val-
ues of the corresponding attribute pairs. Finally, run a min-
spanning tree algorithm on the weighted graph. The
output tree is the desired one.
Besides being a “lighter” version of Bayesian networks,
dependency trees are also interesting in their own right. They
form a complete representation . Additionally, they can
act as initializers for search, as mixture components , or
as components in classiﬁers .
Once the weight matrix is constructed, executing a min-
imum spanning tree (MST) algorithm is fast. The time-con-
suming part is the population of the weight matrix, which
takes time quadratic in the number of attributes and linear in
the number of records. This becomes expensive when con-
sidering datasets with hundreds of thousands of records and
hundreds of attributes.
To overcome this problem, we propose a new way of
interleaving the spanning tree construction with the opera-
tions needed to compute the mutual information coefﬁcients.
We develop a new spanning-tree algorithm, based solely on
Tarjan’s  red-edge rule. This algorithm is capable of
To be precise, we will use it as a maximum spanning tree algo-
rithm. The two are interchangeable, requiring just a reversal of the edge
weight comparison operator. Historically, minimum has been far more
popular a name.