The VLDB Journal (2013) 22:345–368
High efﬁciency and quality: large graphs matching
Yuanyuan Zhu · Lu Qin · Jeffrey Xu Yu ·
Yiping Ke · Xuemin Lin
Received: 19 September 2011 / Revised: 8 August 2012 / Accepted: 19 August 2012 / Published online: 25 September 2012
© Springer-Verlag 2012
Abstract Graph matching plays an essential role in many
real applications. In this paper, we study how to match two
large graphs by maximizing the number of matched edges,
which is known as maximum common subgraph matching
and is NP-hard. To ﬁnd exact matching, it cannot handle
a graph with more than 30 nodes. To ﬁnd an approxi-
mate matching, the quality can be very poor. We propose a
novel two-step approach that can efﬁciently match two large
graphs over thousands of nodes with high matching quality.
In the ﬁrst step, we propose an anchor-selection/expansion
approach to compute a good initial matching. In the second
step, we propose a new approach to reﬁne the initial match-
ing. We give the optimality of our reﬁnement and discuss how
to randomly reﬁne the matching with different combinations.
We further show how to extend our solution to handle labeled
graphs. We conducted extensive testing using real and syn-
thetic datasets and report our ﬁndings in this paper.
Keywords Graph matching · Maximum common
subgraph · Vertex cover
Y. Z h u · L. Qin · J. X. Yu (
) · Y. K e
The Chinese University of Hong Kong, Sha Tin, Hong Kong, China
Y. Z h u
Y. K e
University of New South Wales, Sydney, NSW, Australia
NICTA, Sydney, NSW, Australia
Graph proliferates in a wide variety of applications, includ-
ing social networks in psycho-sociology, attributed graphs in
image processing, food chains in ecology, electrical circuits
in electricity, road networks in transport, protein interaction
networks in biology, topological networks on the Web. Graph
processing has attracted great attention from both research
and industrial communities.
Graph matching is an important type of graph process-
ing, which aims at ﬁnding correspondences between the
nodes/edges of two graphs to ensure that some substructures
in one graph are mapped to similar substructures in the other.
Graph matching plays an essential role in a large number of
concrete applications .
Biology: Protein–protein interaction (PPI) networks play an
important role in most biological processes, in which a node
corresponds to a protein and an edge indicates the inter-
action between two proteins. Comparative analysis of PPI
networks across species provides insightful views of similar-
ities and differences between species at systemic level, and
helps to identify conserved functional components across
species. Graph matching can be effectively used for such
PPI networks comparisons, to maximally identify the pairs
of homologous proteins from two different organisms such
that PPIs are conserved between matched pairs [28,37].
Biochemistry: The genome of an organism is represented
as a graph with genes as nodes and binary relations between
genes as edges, and the metabolic pathway is represented
as another graph with enzymes as nodes and chemical com-
pounds as edges. These two graphs are then matched to iden-
tify FRECs (functionally related enzyme clusters) that reveal
important biological features of the organisms .
Medicine: The electroencephalogram (EEG) signal can be
transformed into a graph with the extracted energy bursts as