Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce

SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce BODUO LI, EDWARD MAZUR, YANLEI DIAO, ANDREW MCGREGOR, and PRASHANT SHENOY, University of Massachusetts Amherst Today's one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the dataset to be fully loaded into the cluster before running analytical queries. This article examines, from a systems standpoint, what architectural design changes are necessary to bring the benefits of the MapReduce model to incremental one-pass analytics. Our empirical and theoretical analyses of Hadoop-based MapReduce systems show that the widely used sortmerge implementation for partitioning and parallel processing poses a fundamental barrier to incremental one-pass analytics, despite various optimizations. To address these limitations, we propose a new data analysis platform that employs hash techniques to enable fast in-memory processing, and a new frequent key based technique to extend such processing to workloads that require a large key-state space. Evaluation of our Hadoop-based prototype using real-world workloads shows http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Database Systems (TODS) Association for Computing Machinery

SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce

Loading next page...
 
/lp/association-for-computing-machinery/scalla-a-platform-for-scalable-one-pass-analytics-using-mapreduce-ttt00PFZP4

References (48)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2012 by ACM Inc.
ISSN
0362-5915
DOI
10.1145/2389241.2389246
Publisher site
See Article on Publisher Site

Abstract

SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce BODUO LI, EDWARD MAZUR, YANLEI DIAO, ANDREW MCGREGOR, and PRASHANT SHENOY, University of Massachusetts Amherst Today's one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the dataset to be fully loaded into the cluster before running analytical queries. This article examines, from a systems standpoint, what architectural design changes are necessary to bring the benefits of the MapReduce model to incremental one-pass analytics. Our empirical and theoretical analyses of Hadoop-based MapReduce systems show that the widely used sortmerge implementation for partitioning and parallel processing poses a fundamental barrier to incremental one-pass analytics, despite various optimizations. To address these limitations, we propose a new data analysis platform that employs hash techniques to enable fast in-memory processing, and a new frequent key based technique to extend such processing to workloads that require a large key-state space. Evaluation of our Hadoop-based prototype using real-world workloads shows

Journal

ACM Transactions on Database Systems (TODS)Association for Computing Machinery

Published: Dec 1, 2012

There are no references for this article.