The VLDB Journal (2005) 14: 30–49 / Digital Object Identiﬁer (DOI) 10.1007/s00778-003-0113-1
Storing and querying XML data using denormalized relational databases
, Yannis Papakonstantinou
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093
Edited by A. Halevy. Received: December 21, 2001 / Accepted: July 1, 2003
Published online: June 23, 2004 –
Abstract. XML database systems emerge as a result of the ac-
ceptance of the XML data model. Recent works have followed
the promising approach of building XML database manage-
ment systems on underlying RDBMS’s. Achieving query pro-
cessing performance reduces to two questions: (i) How should
the XML data be decomposed into data that are stored in the
RDBMS? (ii) How should the XML query be translated into an
efﬁcient plan that sends one or more SQL queries to the under-
lying RDBMS and combines the data into the XML result? We
provide a formal framework for XML Schema-driven decom-
positions, which encompasses the decompositions proposed
in prior work and extends them with decompositions that em-
ploy denormalized tables and binary-coded XML fragments.
We provide corresponding query processing algorithms that
translate the XML query conditions into conditions on the
relational tables and assemble the decomposed data into the
XML query result. Our key performance focus is the response
time for delivering the ﬁrst results of a query. The most effec-
tive of the described decompositions have been implemented
in XCacheDB, an XML DBMS built on top of a commercial
RDBMS, which serves as our experimental basis. We present
experiments and analysis that point to a class of decomposi-
tions, called inlined decompositions, that improve query per-
formance for full results and ﬁrst results, without signiﬁcant
increase in the size of the database.
The acceptance and expansion of the XML model creates a
need for XML database systems [3,4,8,10,15,19,23,25,31,
32,34,35,41]. One approach towards building XML DBMS’s
is based on leveraging an underlying RDBMS for storing
and querying the XML data. This approach allows the XML
database to take advantage of mature relational technology,
which provides reliability, scalability, high performance in-
dices, concurrency control and other advanced functionality.
Andrey Balmin has been supported by NSF IRI-9734548.
The authors built the XCacheDB system while on leave at Enosys
Software, Inc., during 2000.
Fig. 1. The XML database architecture
We provide a formal framework for XML Schema-driven
decompositions of the XML data into relational data. The
described framework encompasses the decompositions de-
scribed in prior work on XML Schema-driven decompositions
[3,34] and extends prior work with a wide range of decom-
positions that employ denormalized tables and binary-coded
non-atomic XML fragments.
The most effective among the set of the described decom-
positions have been implemented in the presented XCacheDB,
an XML DBMS built on top of a commercial RDBMS .
XCacheDB follows the typical architecture (see Fig. 1) of
an XML database built on top of a RDBMS [3,8,23,32,34].
First, XML data, accompanied by their XML Schema , is
loaded into the database using the XCacheDB loader, which
consists of two modules: the schema processor and the data
decomposer. The schema processor inputs the XML Schema
and creates in the underlying relational database tables re-
quired to store any document conforming to the given XML
schema. The conversion of the XML schema into relational
may use optional user guidance. The mapping from the XML