The VLDB Journal 11: 68–91 (2002) / Digital Object Identiﬁer (DOI) 10.1007/s007780200062
Query processing techniques for arrays
Arunprasad P. Marathe
, Kenneth Salem
Department of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada
Edited by M. Carey. Received: 10 August 2001 / Accepted: 11 December 2001
Published online: 24 May 2002 –
Abstract. Arrays are a common and important class of data.
At present, database systems do not provide adequate array
support: arrays can neither be easily deﬁned nor conveniently
manipulated. Further, array manipulations are not optimized.
This paperdescribes a language called theArrayManipulation
Language (AML), for expressing array manipulations, and a
collection of optimization techniques for AML expressions.
In the AML framework for array manipulation, arbitrary
externally-deﬁnedfunctions can be applied toarrays ina struc-
tured manner.AML can be adapted to different application do-
mains by choosing appropriate external function deﬁnitions.
This paper concentrates on arrays occurring in databases of
digital images such as satellite or medical images.
AML queries can be treated declaratively and subjected
to rewrite optimizations. Rewriting minimizes the number of
applications of potentially costly external functions required
to compute a query result. AML queries can also be optimized
for space. Query results are generated a piece at a time by
pipelined execution plans, and the amount of memory required
by a plan depends on the order in which pieces are generated.
An optimizer can consider generating the pieces of the query
result in a variety of orders, and can efﬁciently choose or-
ders that require less space. An AML-based prototype array
database system called ArrayDB has been built, and it is used
to show the effectiveness of these optimization techniques.
Key words: Array manipulation language – Array query op-
timization – Declarative query language – User-deﬁned func-
tions – Pipelined evaluation – Memory-usage optimization
Arrays are an appropriate model for many types of data, in-
cluding digital images, digital video, and gridded outputs from
This research was partially supported by NSERC (Natural Sciences
and Engineering Research Council of Canada). We are grateful to the
anonymous reviewers for their comments and suggestions.
Present address: Open Text Corporation, 185 Columbia StreetWest,
Waterloo, Ontario N2L 5Z5, Canada
band 3 image
band 4 image
dim. 2 (spectral)
seven-band Thematic Mapper = image
Fig. 1. A ThematicMapper image and several derived images
computational models. Array data are common in many ap-
plication domains, such as remote sensing and medical imag-
ing [3,21,22]. Most database management systems (DBMS),
however, provide very limited support for arrays.
This paper presents the Array Manipulation Language
(AML), which can be used to describe array queries. AML
expressions describe how arbitrary, externally deﬁned func-
tions are used to generate a desired query result. Thus, by
appropriate choice of functions, AML can be customized for
a particular application. Because arrays may be large, and ar-
ray manipulations complex, array queries may be expensive.
This paper presents an array query processing algorithm that
generates optimized, pipelined query evaluation plans from
AML queries. The query processor is implemented in an array
database system called ArrayDB.
Figure 1 shows a remote sensing example that illustrates
the kind of array queries for which AML is well suited. The
three-dimensional array A in Fig. 1 represents a multi-spectral
image captured by the Landsat Thematic Mapper. Two of the
array dimensions are spatial, and the third is spectral. This
array can be thought of as a stack of seven two-dimensional
images of the same scene, each captured using a sensor sen-
sitive to a different band of the electromagnetic spectrum.
Often, multi-spectral images such as A are not used di-
rectly. Instead, useful parameters are derived from them. An