The VLDB Journal (2003) 12: 157–169 / Digital Object Identiﬁer (DOI) 10.1007/s00778-003-0097-x
Watermarking relational data: framework, algorithms and analysis
Rakesh Agrawal, Peter J. Haas, Jerry Kiernan
IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA;
Edited by P. Bernstein. Received: July 29, 2002 / Accepted: December 10, 2002
Published online: July 10, 2003 –
Abstract. We enunciate the need for watermarking database
relations to deter data piracy, identify the characteristics of
relational data that pose unique challenges for watermarking,
and delineate desirable properties of a watermarking system
for relational data. We then present an effective watermarking
technique geared for relational data. This technique ensures
that some bit positions of some of the attributes of some of
the tuples contain speciﬁc values. The speciﬁc bit locations
and values are algorithmically determined under the control
of a secret key known only to the owner of the data. This bit
pattern constitutes the watermark. Only if one has access to the
secret key can the watermark be detected with high probability.
Detecting the watermark requires access neither to the original
data nor the watermark, and the watermark can be easily and
efﬁciently maintained in the presence of insertions, updates,
and deletions. Our analysis shows that the proposed technique
is robust against various forms of malicious attacks as well as
benign updates to the data. Using an implementation running
on DB2, we also show that the algorithms perform well enough
to be used in real-world applications.
Keywords: Watermarking – Steganography – Information
hiding – Database
The piracy of software, images, video, audio, and text has long
been a concern for owners of these digital assets. Protection
schemes are usually based upon the insertion of digital wa-
termarks into the data [5,10,12]. The watermarking software
introduces small errors into the object being watermarked.
These intentional errors are called marks, and all the marks
together constitute the watermark. The marks are chosen so
as to have an insigniﬁcant impact on the usefulness of the data
and are placed in such a way that a malicious user cannot de-
stroy them without making the data signiﬁcantly less useful.
Although watermarking does not prevent illegal copying, it
A preliminary version of this paper appeared in the Proceedings of
the 28th VLDB Conference, Hong Kong, China, 2002.
deters such copying by providing a means for establishing the
original ownership of a redistributed copy.
The increasing use of databases in applications beyond
“behind-the-ﬁrewalls data processing” is creating a similar
need for watermarking databases. The Internet is exerting
tremendous pressure on data providers to create services that
allow users to search and access databases remotely. Although
this trend is a boon to end users, it exposes the data providers
to the threat of data theft. Providers are therefore demanding
technology for identifying pirated copies of their databases.
1.1 The need for new watermarking techniques
There is a rich body of literature on watermarking multimedia
data [5,10,12]. Most of these techniques were initially devel-
oped for still images  and later extended to video  and
audio sources . While there is much to learn from this litera-
ture, there are also new technical challenges because relational
and multimedia data differ in a number of important respects:
• A multimedia object consists of a large number of bits
with considerable redundancy. Thus, the watermark has a
large cover in which to hide. A database relation consists
of tuples, each of which represents a separate object. The
watermark needs to be spread over these separate objects.
• The relative spatial/temporal positioning of various pieces
of a multimedia object typically does not change. Tuples
of a relation, on the other hand, constitute a set, and there
is no implied ordering between them.
• Multimedia objects typically remain intact; portions of an
object cannot be dropped or replaced arbitrarily without
causing perceptual changes in the object. On the other
hand, tuple insertions, deletions, and updates are the norm
in the database setting.
Because of these differences, techniques developed for
multimedia data cannot be directly used for watermarking re-
lations. To elaborate this point further, let us map a relation to
an image by treating every attribute value as a pixel. Unfor-
tunately, the “image” thus deﬁned will lack many properties
of a real image. For instance, pixels in a neighborhood in a
real image are usually highly correlated, and this assumption