Exploiting predicate-window semantics over data streamsGhanem, Thanaa M.; Aref, Walid G.; Elmagarmid, Ahmed K.
doi: 10.1145/1121995.1121996pmid: N/A
The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries over data streams cannot be answered using the sliding-window query model. Thus, we introduce a new model for continuous window queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate-window queries have some distinguishing characteristics, e.g., (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. In this paper, we discuss the applicability of the predicate-window query model. We will show how the existing sliding-window query models fail to answer some of the predicate-window queries. Finally, we discuss the challenges in supporting the predicate-window query model in data stream management systems.
Micro-views, or on how to protect privacy while enhancing data usability: concepts and challengesByun, Ji-Won; Bertino, Elisa
doi: 10.1145/1121995.1121997pmid: N/A
The large availability of repositories storing various types of information about individuals has raised serious privacy concerns over the past decade. Nonetheless, database technology is far from providing adequate solutions to this problem that requires a delicate balance between an individual's privacy and convenience and data usability by enterprises and organizations - a database which is rigid and over-protective may render data of little value. Though these goals may seem odd, we argue that the development of solutions able to reconcile them will be an important challenge to be addressed in the next few years. We believe that the next wave of database technology will be represented by a DBMS that provides high-assurance privacy and security. In this paper, we elaborate on such challenges. In particular, we argue that we need to provide different views of data at a very fine level of granularity; conventional view technology is able to select only up to a single attribute value for a single tuple. We need to go even beyond this level. That is, we need a mechanism by which even a single value inside a tuple's attribute may have different views; we refer them as micro-views. We believe that such a mechanism can be an important building block, together with other mechanisms and tools, of the next wave of database technology.
Research issues in data stream association rule miningJiang, Nan; Gruenwald, Le
doi: 10.1145/1121995.1121998pmid: N/A
There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper discusses those issues and how they are addressed in the existing literature.
Join minimization in XML-to-SQL translation: an algebraic approachMani, Murali; Wang, Song; Dougherty, Dan; Rundensteiner, Elke A.
doi: 10.1145/1121995.1121999pmid: N/A
Consider an XML view defined over a relational database, and a user query specified over this view. This user XML query is typically processed using the following steps: (a) our translator maps the XML query to one or more SQL queries, (b) the relational engine translates an SQL query to a relational algebra plan, (c) the relational engine executes the algebra plan and returns SQL results, and (d) our translator translates the SQL results back to XML. However, a straightforward approach produces a relational algebra plan after step (b) that is inefficient and has redundant joins. In this paper, we report on our preliminary observations with respect to how joins in such a relational algebra plan can be minimized. Our approach works on the relational algebra plan and optimizes it using novel rewrite rules that consider pairs of joins in the plan and determine whether one of them is redundant and hence can be removed. Our study shows that algebraic techniques achieve effective join minimization, and such techniques are useful and can be integrated into mainstream SQL engines.
Dynamic count filtersAguilar-Saborit, J.; Trancoso, P.; Muntes-Mulero, V.; Larriba-Pey, J. L.
doi: 10.1145/1121995.1122000pmid: N/A
Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly and change patterns frequently. Counting Bloom Filters (CBF) have been proposed to overcome this limitation and allow for the dynamic evolution of Bloom filters. The only dynamic approach to a compact and efficient representation of CBF are the Spectral Bloom Filters (SBF).In this paper we propose the Dynamic Count Filters (DCF) as a new dynamic and space-time efficient representation of CBF. Although DCF does not make a compact use of memory, it shows to be faster and more space efficient than any previous proposal. Results show that the proposed data structure is more efficient independently of the incoming data characteristics.
Towards a dynamic multi-policy dissemination control model: (DMDCON)Li, Zude; Ye, Xiaojun
doi: 10.1145/1121995.1122001pmid: N/A
Dissemination control ( DCON ) is a security policy of controlling digital resource access before and after distribution. It is an extension of traditional access control within client-side domain, digital rights management by payment-free applications, and originator control on recipients' re-dissemination rights allowance. Different application domains may adopt dynamically different resource dissemination policies, but current DCON models cannot solve the multi-policy coexistence and compatibility problems. A dynamic multi-policy dissemination control model ( DMDCON ) is proposed to express the dynamic and multi-policy nature existing in reality, which are indispensable for well formed resource dissemination control application. The goal of this paper is to define and extend formally some basic concepts related with resource dissemination (such as dissemination policy, chain, tree, etc.) and further, propose a comprehensive DMDCON model to describe universal resource dissemination applications through specifying temporal dissemination features, restrictions, and policy revocation (cascade or non-cascade). Finally, we briefly discuss the importance of DCON within the usage control domain.
B-tree indexes for high update ratesGraefe, Goetz
doi: 10.1145/1121995.1122002pmid: N/A
In some applications, data capture dominates query processing. For example, monitoring moving objects often requires more insertions and updates than queries. Data gathering using automated sensors often exhibits this imbalance. More generally, indexing streams is considered an unsolved problem.For those applications, B-tree indexes are good choices if some trade-off decisions are tilted towards optimization of updates rather than towards optimization of queries. This paper surveys some techniques that let B-trees sustain very high update rates, up to multiple orders of magnitude higher than traditional B-trees, at the expense of query processing performance. Not surprisingly, some of these techniques are reminiscent of those employed during index creation, index rebuild, etc., while other techniques are derived from well known technologies such as differential files and log-structured file systems.
Moshe Vardi speaks out on the proof, the whole proof, and nothing but the proofWinslett, Marianne
doi: 10.1145/1121995.1122008pmid: N/A
Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and I have here with me Moshe Vardi, who holds an endowed professorship at Rice University and is a former chair of their Computer Science Department. Before joining Rice, Moshe was a manager at IBM Almaden Research Center. Moshe is an ACM Fellow, a AAAI Fellow, a co-winner of the Goedel Prize, and a member of the U.S. National Academy of Engineering and the European Academy of Sciences. His research interests include databases, verification, complexity theory, and multi-agent systems, and his PhD is from the Hebrew University of Jerusalem. So Moshe, welcome!