Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Ge Luo; Lu Wang; Ke Yi; Graham Cormode

doi:10.1007/s00778-016-0424-7

Loading next page...

References (32)

N. Alon, Yossi Matias, M. Szegedy (1996)
The space complexity of approximating the frequency moments
J. Comput. Syst. Sci., 58
Graham Cormode, M. Garofalakis, S. Muthukrishnan, R. Rastogi (2005)
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Calyampudi Rao (1965)
Linear statistical inference and its applications
M. Greenwald, S. Khanna (2001)
Space-efficient online computation of quantile summaries
Graham Cormode, Flip Korn, S. Muthukrishnan, D. Srivastava (2006)
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Lu Wang, Ge Luo, K. Yi, Graham Cormode (2013)
Quantiles over data streams: an experimental study
A. Arasu, G. Manku (2004)
Approximate counts and quantiles over sliding windows
P. Agarwal, Graham Cormode, Zengfeng Huang, J. Phillips, Zhewei Wei, K. Yi (2012)
Mergeable summaries
M. Charikar, Kevin Chen, Martín Farach-Colton (2002)
Finding frequent items in data streams
Theor. Comput. Sci., 312
malaman (2016)
Mécanique analytique I
Graham Cormode, S. Muthukrishnan (2004)
An improved data stream summary: the count-min sketch and its applications
M. Blum, R. Floyd, V. Pratt, R. Rivest, R. Tarjan (1973)
Time Bounds for Selection
J. Comput. Syst. Sci., 7
David Felber, R. Ostrovsky (2015)
A Randomized Online Quantile Summary in O((1/ε) log(1/ε)) Words
Theory Comput., 13
A. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss (2002)
How to Summarize the Universe: Dynamic Maintenance of Quantiles
M. Greenwald, S. Khanna (2004)
Power-conserving computation of order-statistics over sensor networks
VN Vapnik, AY Chervonenkis (1971)
On the uniform convergence of relative frequencies of events to their probabilities
Theory Probab. Appl., 16
Sumit Ganguly, Anirban Majumder (2006)
CR-precis: A Deterministic Summary Structure for Update Data Streams
JI Munro, MS Paterson (1980)
Selection and sorting with limited storage
Theor. Comput. Sci., 12
V. Vapnik (1971)
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
G. Manku, S. Rajagopalan, B. Lindsay (1999)
Random sampling techniques for space efficient online computation of order statistics of large datasets
R. Pike, S. Dorward, R. Griesemer, Sean Quinlan (2005)
Interpreting the data: Parallel analysis with Sawzall
Sci. Program., 13
Zengfeng Huang, Lu Wang, K. Yi, Yunhao Liu (2011)
Sampling based algorithms for quantile computation in sensor networks
A. Mcgregor, Amit Chakrabarti (1978)
Selection and sorting with limited storage
19th Annual Symposium on Foundations of Computer Science (sfcs 1978)
N. Johnson (1966)
Linear Statistical Inference and Its Applications
Technometrics, 8
Nisheeth Shrivastava, C. Buragohain, D. Agrawal, S. Suri (2004)
Medians and beyond: new aggregation techniques for sensor networks
S. Suri, Csaba Tóth, Yunhong Zhou (2004)
Range Counting over Multidimensional Data Streams
Discrete & Computational Geometry, 36
K. Yi, Qin Zhang (2008)
Optimal Tracking of Distributed Heavy Hitters and Quantiles
Algorithmica, 65
Chao Li, Michael Hay, Vibhor Rastogi, G. Miklau, A. Mcgregor (2009)
Optimizing Histogram Queries under Differential Privacy
ArXiv, abs/0912.4742
G. Manku, S. Rajagopalan, B. Lindsay (1998)
Approximate medians and other quantiles in one pass and with limited memory
R Pike, S Dorward, R Griesemer, S Quinlan (2005)
Interpreting the data: parallel analysis with sawzall
Dyn. Grids Worldw. Comput., 13
N. Govindaraju, N. Raghuvanshi, Dinesh Manocha (2005)
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Graham Cormode, T. Johnson, Flip Korn, S. Muthukrishnan, O. Spatscheck, D. Srivastava (2004)
Holistic UDAFs at streaming speeds

Publisher: Springer Journals
Copyright: Copyright © 2016 by Springer-Verlag Berlin Heidelberg
Subject: Computer Science; Database Management
ISSN: 1066-8888
eISSN: 0949-877X
DOI: 10.1007/s00778-016-0424-7
Publisher site: See Article on Publisher Site

Abstract

A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data are given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be extremely useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods and describe efficient implementations. In doing so, we propose new variants that have not been studied before, yet which outperform existing methods. To illustrate this, we provide detailed experimental comparisons demonstrating the trade-offs between space, time, and accuracy for quantile computation.

Journal

The VLDB Journal – Springer Journals

Published: Feb 8, 2016

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

References (32)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies