The VLDB Journal (2003) 12: 87–88 / Digital Object Identiﬁer (DOI) 10.1007/s00778-003-0092-2
Special issue: Best papers of VLDB 2002
Published online: July 8, 2003 –
This special issue comprises six papers from the 28th Inter-
national Conference on Very Large Data Bases, held in Hong
Kong, 20–23 August 2002. There were 69 research papers
presented at the Conference, selected from 431 submissions.
From these 69 papers, the three program chairs selected six
as the best of the conference. The authors of these six papers
produced extended versions of their conference papers, which
were further developed through two rounds of reviewing.
Recent VLDB conferences have strived to broaden the
range of topics covered at the conference beyond database
system mechanisms to information management in general.
To strengthen this trend, the VLDB Endowment Board of
Trustees recommended a new program committee structure
organized by topic rather than geography. VLDB 2002 was
the ﬁrst VLDB conference to adopt this structure, which had
one committee for core database technology and another for
infrastructure for information systems. They handled 209 and
222 submissions, respectively.
Two of the papers in this special issue explore the core
database topics of physical data organization and indexing.
The other four papers cover topics that reﬂect the increased
breadth of the conference itself. Two of the papers address the
problem of processing queries against streams of data, rather
than against a snapshot of data stored on disk. This research
problem has become popular in recent years, motivated in
large part by the prospect of large numbers of sensor devices
delivering streams of data that need to be processed in real
time. The other two papers explore techniques for watermark-
ing relational data to avoid data piracy and the problem of text
Two popular techniques for the organization of physical
database tables are column-wise layout, which is more efﬁ-
cient for evaluating predicates on a small number of columns,
and row-wise layout, which is more efﬁcient for retrieving
and updating complete rows. In “A case for fractured mir-
rors,” Ravishankar Ramamurthy, David J. DeWitt and Qi Su
propose to store each partition of a table column-wise in one
mirror and row- wise in the other, thereby getting the beneﬁts
of both layouts.
Large collections of regular expressions arise in a num-
ber of applications, including ﬁltering of XML documents to
reﬂect user interests expressed in XPath, content-based rout-
ing and classiﬁcation of XML documents and Internet routing
protocols. An important task in such applications is to take a
given string and ﬁnd all regular expressions that it matches in
the collection. The paper “RE-tree: an efﬁcient index structure
for regular expressions,” by Chee-Yong Chan, Minos Garo-
falakis and Rajeev Rastogi describes and experimentally eval-
uates a novel indexing scheme for efﬁciently retrieving regular
expressions that match a given string.
Traditional database systems are built based on certain as-
sumptions that fail to hold in monitoring applications. Key
characteristics of such applications include streams of infor-
mation constantly coming into the system (often in real time),
large numbers of triggers deﬁning the system’s reactions, his-
torical data being ﬁrst-class citizens and data being impre-
cise. The paper “Aurora: a new model and architecture for
data stream management” by Daniel J. Abadi, Don Carney,
Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sang-
don Lee, Michael Stonebraker, Nesime Tatbul and Stan Zdonik
describes the architecture and basic operational features of Au-
rora, a stream data management system that is built to address
the special requirements of monitoring environments.
The problem of monitoring data streams is also the focus of
the paper “PSoup: a system for streaming queries over stream-
ing data” by Sirish Chandrasekaran and Michael J. Franklin.
In contrast to previous systems, PSoup treats data and queries
symmetrically, allowing new queries to examine old data and
old queries to monitor newly arriving data. The approach taken
is to view processing of multiple continuous queries as a join
of query and data streams.
Owners of intellectual property have been struggling with
the problem of piracy of digital information. While there has
been much press about the piracy of software, music and im-
ages, there has been less attention to the piracy of formatted
data in databases. The paper “Watermarking relational data:
framework, algorithms, and analysis” by Rakesh Agrawal,
Peter J. Haas and Jerry Kiernan addresses this problem by
systematically perturbing data values. The data modiﬁcations
make it possible to identify where the data came from but are
small enough to avoid decreasing the value of the data to users.
Support vector machines (SVMs) are among the most ac-
curate approaches to text classiﬁcation, but suffer from long
training times and large memory footprints, which prevent