The VLDB Journal (2013) 22:849–874
DB: exploring relational databases via result-driven
Marina Drosou · Evaggelia Pitoura
Received: 22 February 2012 / Revised: 8 February 2013 / Accepted: 13 March 2013 / Published online: 17 May 2013
© Springer-Verlag Berlin Heidelberg 2013
Abstract The typical user interaction with a database sys-
tem is through queries. However, many times users do not
have a clear understanding of their information needs or
the exact content of the database. In this paper, we propose
assisting users in database exploration by recommending to
them additional items, called Ymal (“You May Also Like”)
results, that, although not part of the result of their original
query, appear to be highly related to it. Such items are com-
puted based on the most interesting sets of attribute values,
called faSets, that appear in the result of the original query.
The interestingness of a faSet is deﬁned based on its fre-
quency in the query result and in the database. Database fre-
quency estimations rely on a novel approach of maintaining
a set of representative rare faSets. We have implemented our
approach and report results regarding both its performance
and its usefulness.
Keywords Recommendations · Faceted search ·
Typically, users interact with a database system by for-
mulating queries. This query-response mode of interaction
assumes that users are to some extent familiar with the con-
tent of the database and that they have a clear understanding
of their information needs. However, as databases become
larger and accessible to a more diverse and less technically
M. Drosou (
) · E. Pitoura
Computer Science Department, University of Ioannina,
oriented audience, a more exploratory mode of information
seeking seems relevant and useful .
Previous research has mainly focused on assisting users
in reﬁning or generalizing their queries. Approaches to the
many-answers problem range from reformulating the orig-
inal query so as to restrict the size of the result, for exam-
ple, by adding constraints to the query (e.g., ), to auto-
matically ranking query results and presenting to users only
the top-k most highly ranked among them (e.g., ). With
facet search (e.g., ), users start with a general query and
progressively narrow its results down to a speciﬁc item by
specifying at each step facet conditions, i.e., restrictions on
attribute values. The empty-answers problem is commonly
handled by relaxing the original query (e.g., ).
In this paper, we propose a novel exploratory mode of
database interaction that allows users to discover items that
although not part of the result of their original query are
highly correlated to this result.
In particular, at ﬁrst, the interesting parts of the result
of the initial user query are identiﬁed. These are sets of
(attribute, value) pairs, called faSets, that are highly relevant
to the query. For example, assume a user who asks about the
characteristics (such as genre, production year or country) of
movies by a speciﬁc director, e.g., M. Scorsese. Our system
will highlight the interesting aspects of these results, e.g.,
interesting years, pairs of genre and years, and so on (Fig. 1).
The interestingness of each faSet is based on its frequency.
Intuitively, the more frequent a faSet in the result, the more
relevant to the query. To account for popular faSets, we
also consider their frequency in the database. For example,
the reason that a movie genre appears more frequently than
another may not be attributed to the speciﬁc director but to
the fact that this is a very common genre. To address the fun-
damental problem of locating interesting faSets efﬁciently,
we introduce appropriate data structures and algorithms.