Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

Quantiles over data streams: experimental comparisons, new analyses, and further improvements A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data are given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be extremely useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods and describe efficient implementations. In doing so, we propose new variants that have not been studied before, yet which outperform existing methods. To illustrate this, we provide detailed experimental comparisons demonstrating the trade-offs between space, time, and accuracy for quantile computation. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The VLDB Journal Springer Journals

Quantiles over data streams: experimental comparisons, new analyses, and further improvements

The VLDB Journal , Volume 25 (4) – Feb 8, 2016

Loading next page...
 
/lp/springer_journal/quantiles-over-data-streams-experimental-comparisons-new-analyses-and-OJQ2LZB6Wl

References (32)

Publisher
Springer Journals
Copyright
Copyright © 2016 by Springer-Verlag Berlin Heidelberg
Subject
Computer Science; Database Management
ISSN
1066-8888
eISSN
0949-877X
DOI
10.1007/s00778-016-0424-7
Publisher site
See Article on Publisher Site

Abstract

A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data are given incrementally, and we must compute the quantiles in an online, streaming fashion. While such algorithms have proved to be extremely useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods and describe efficient implementations. In doing so, we propose new variants that have not been studied before, yet which outperform existing methods. To illustrate this, we provide detailed experimental comparisons demonstrating the trade-offs between space, time, and accuracy for quantile computation.

Journal

The VLDB JournalSpringer Journals

Published: Feb 8, 2016

There are no references for this article.