Scientific Data Mining and Knowledge Discovery: Data Streams: An Overview and Scientific Applications

Charu C. Aggarwal

Loading next page...

References (20)

Brian Babcock, Mayur Datar, R. Motwani (2002)
Sampling from a moving window over streaming data
G. Kollios, J. Byers, Jeffrey Considine, Marios Hadjieleftheriou, Feifei Li (2005)
Robust Aggregation in Sensor Networks
IEEE Data Eng. Bull., 28
Charu Aggarwal (2006)
Data Streams: Models and Algorithms (Advances in Database Systems)
Guozhu Dong, Jiawei Han, Joyce Lam, J. Pei, Ke Wang (2001)
Mining Multi-Dimensional Constrained Gradients in Data Cubes
G. Manku, R. Motwani (2012)
Approximate Frequency Counts over Data Streams
Proc. VLDB Endow., 5
Geoff Hulten, Laurie Spencer, Pedro Domingos (2001)
Mining time-changing data streams
R. Jin, G. Agrawal (2005)
An algorithm for in-core frequent itemset mining on streaming data
Fifth IEEE International Conference on Data Mining (ICDM'05)
C. Aggarwal, Philip Yu (2006)
A Framework for Clustering Massive Text and Categorical Data Streams
Byoung-Kee Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, A. Biliris (2000)
Online data mining for co-evolving time sequences
Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073)
C. Aggarwal, Philip Yu (2008)
A Framework for Clustering Uncertain Data Streams
2008 IEEE 24th International Conference on Data Engineering
P. Indyk (2000)
Stable distributions, pseudorandom generators, embeddings and data stream computation
Proceedings 41st Annual Symposium on Foundations of Computer Science
Pedro Domingos, Geoff Hulten (2000)
Mining high-speed data streams
Graham Cormode, M. Garofalakis (2005)
Sketching Streams Through the Net: Distributed Approximate Query Tracking
C. Aggarwal (2003)
A framework for diagnosing changes in evolving data streams
Graham Cormode, S. Muthukrishnan (2004)
An improved data stream summary: the count-min sketch and its applications
Yun Chi, Haixun Wang, Philip Yu, R. Muntz (2004)
Moment: maintaining closed frequent itemsets over a stream sliding window
Fourth IEEE International Conference on Data Mining (ICDM'04)
Yasushi Sakurai, S. Papadimitriou, C. Faloutsos (2005)
BRAID: stream mining through group lag correlations
C. Giannella, Jiawei Han, Xifeng Yan, Philip Yu (2002)
Mining Frequent Patterns in Data Streams at Multiple Time Granularities
C. Aggarwal (2006)
On biased reservoir sampling in the presence of stream evolution
J. Chang, W. Lee (2003)
Finding recent frequent itemsets adaptively over online data streams

Publisher: Springer Berlin Heidelberg
Copyright: © Springer-Verlag Berlin Heidelberg 2010
ISBN: 978-3-642-02787-1
Pages: 377–397
DOI: 10.1007/978-3-642-02788-8_14
Publisher site: See Chapter on Publisher Site

Abstract

[In recent years, advances in hardware technology have facilitated the ability to collect data continuously. Simple transactions of everyday life such as using a credit card, a phone, or browsing the web lead to automated data storage. Similarly, advances in information technology have lead to large flows of data across IP networks. In many cases, these large volumes of data can be mined for interesting and relevant information in a wide variety of applications. When the volume of the underlying data is very large, it leads to a number of computational and mining challenges: With increasing volume of the data, it is no longer possible to process the data efficiently by using multiple passes. Rather, one can process a data item at most once. This leads to constraints on the implementation of the underlying algorithms. Therefore, stream mining algorithms typically need to be designed so that the algorithms work with one pass of the data. In most cases, there is an inherent temporal component to the stream mining process. This is because the data may evolve over time. This behavior of data streams is referred to as temporal locality. Therefore, a straightforward adaptation of one-pass mining algorithms may not be an effective solution to the task. Stream mining algorithms need to be carefully designed with a clear focus on the evolution of the underlying data. Another important characteristic of data streams is that they are often mined in a distributed fashion. Furthermore, the individual processors may have limited processing and memory. Examples of such cases include sensor networks, in which it may be desirable to perform in-network processing of data stream with limited processing and memory [1, 2]. This chapter will provide an overview of the key challenges in stream mining algorithms which arise from the unique setup in which these problems are encountered. This chapter is organized as follows. In the next section, we will discuss the generic challenges that stream mining poses to a variety of data management and data mining problems. The next section also deals with several issues which arise in the context of data stream management. In Sect. 3, we discuss several mining algorithms on the data stream model. Section 4 discusses various scientific applications of data streams. Section 5 discusses the research directions and conclusions.]

Published: Jul 31, 2009

Keywords: Data Stream; Frequent Pattern; Mining Algorithm; Frequent Itemsets; Stream Mining

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Scientific Data Mining and Knowledge DiscoveryData Streams: An Overview and Scientific Applications

Scientific Data Mining and Knowledge DiscoveryData Streams: An Overview and Scientific Applications

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Scientific Data Mining and Knowledge DiscoveryData Streams: An Overview and Scientific Applications

Scientific Data Mining and Knowledge DiscoveryData Streams: An Overview and Scientific Applications

References (20)

Abstract

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies