The VLDB Journal (2004) 13: 317 / Digital Object Identiﬁer (DOI) 10.1007/s00778-004-0141-5
Guest Editorial to the special issue on data stream processing
, Joseph M. Hellerstein
Department of Computer Science, Cornell University; Ithaca, NY 14853, USA; e-mail: firstname.lastname@example.org
Computer Science Division, University of California, Berkeley; Berkeley, CA 94720, USA; e-mail: email@example.com
Intel Research, Berkeley; Berkeley, CA; 94704, USA
Published online: November 12, 2004 –
Data stream management techniques have been a hot research
area in the database community for the last 5 years. To our
call for papers for this special issue with a deadline of October
2003 we received 23 submissions that covered a wide range
of ongoing data stream research. In two rounds of review,
we selected ﬁve papers that represent the diversity and depth
of this research. Early work in data streams concentrated on
developing efﬁcientalgorithms for speciﬁc data stream queries
such as sampling, join size estimation, and quantiles. This
issue shows that current data stream research has matured and
transcended pure algorithmic research to novel data types such
as XML and to core systems issues.
The stream considered in the ﬁrst paper consists of XML
user queries rather than traditional data records. The paper
considers how to efﬁciently mine frequent XML query pat-
terns. As it is not feasible to keep all queries in main memory,
the authors give efﬁcient algorithms to incrementally maintain
frequent user queries.
The second paper considers how a data stream manage-
ment system can deal with load spikes by carefully scheduling
operators in the system. The suggested scheduling method,
chain scheduling, keeps the output latency within a given
bound while minimizing queuing memory.
The third paper shows how to give approximate answers to
aggregate queries over datasets undergoing constant change.
In particular, this paper focuses on dealing with a stream that
includes not only insertions of new data but also deletions of
The fourth paper is an experience paper. It describes the
latest lessons from the design and implementation of the Au-
rora stream processing engine, and it describes the authors’
vision for their next system.
The issue concludes with an article on data stream pro-
cessing in sensor networks. Sensor nodes are different from
traditional computers since energy is one of the limiting fac-
tors. The authors propose two methods for saving energy. First,
they propose a group-aware network construction that mini-
mizes network trafﬁc. Second, they allow queries to specify
that approximate query results (within user-speciﬁed bounds)
are sufﬁcient, a further opportunity to reduce trafﬁc.
Overall, we believe that these papers are an excellent snap-
shot of the state of the data stream community as of early 2004,
and we hope that you will enjoy reading the papers as much
as we did.
Acknowledgements. We would like to thank Tamer
Ozsu, our editor-
in-chief, for his advice and support throughout the process, and we
would like to thank Stacey Shirk for administrative support. Our
biggest thanks go to the authors whose contributions created the issue
that you are reading.
Joseph M. Hellerstein