Access the full text.
Sign up today, get DeepDyve free for 14 days.
M. Eltabakh, Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, J. McPherson (2011)
CoHadoop: Flexible Data Placement and Its Exploitation in HadoopProc. VLDB Endow., 4
Nicolas Bruno, S. Chaudhuri (2006)
To tune or not to tune?: a lightweight physical design alerter
J. Dittrich, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, Jörg Schad (2012)
Only Aggressive Elephants are Fast ElephantsArXiv, abs/1208.0287
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, J. Dittrich (2011)
Trojan data layouts: right shoes for a running elephantProceedings of the 2nd ACM Symposium on Cloud Computing
(2010)
Runtime measurements in the cloud: observing, analyzing, and reducing variance
(2011)
Keynote: programming and debugging large-scale data processing workflows
Stratos Idreos, M. Kersten, S. Manegold (2007)
Database Cracking
A. Ailamaki, D. DeWitt, M. Hill, Marios Skounakis (2001)
Weaving Relations for Cache Performance
G. Graefe, Harumi Kuno (2010)
Self-selecting, self-tuning, incrementally optimized indexes
Andrew Pavlo, Erik Paulson, A. Rasin, D. Abadi, D. DeWitt, S. Madden, M. Stonebraker (2009)
A comparison of approaches to large-scale data analysisProceedings of the 2009 ACM SIGMOD International Conference on Management of data
Nicolas Bruno, S. Chaudhuri (2007)
Physical design refinement: The ‘merge-reduce’ approachACM Trans. Database Syst., 32
Tom White (2009)
Hadoop: The Definitive Guide
Jimmy Lin, D. Ryaboy, Kevin Weil (2011)
Full-text indexing for optimizing selection operations in large-scale data analytics
S. Chaudhuri, Vivek Narasayya (2007)
Self-Tuning Database Systems: A Decade of Progress
SJ Finkelstein (1988)
Physical database design for relational databasesACM TODS, 13
E. Jahani, Michael Cafarella, C. Ré (2011)
Automatic Optimization for MapReduce ProgramsArXiv, abs/1104.3217
J. Dittrich, Peter Fischer, Donald Kossmann (2005)
AGILE: adaptive indexing for context-aware information filters
(2012)
Hadoop Users
Nicolas Bruno, S. Chaudhuri (2007)
An Online Approach to Physical Design Tuning2007 IEEE 23rd International Conference on Data Engineering
A. Abouzeid, D. Abadi, A. Silberschatz (2013)
Invisible loading: access-driven data transfer from raw files into database systems
J. Dittrich, Jorge-Arnulfo Quiané-Ruiz (2012)
Efficient Big Data Processing in Hadoop MapReduceProc. VLDB Endow., 5
M. Zaharia, Dhruba Borthakur, Joydeep Sarma, Khaled Elmeleegy, S. Shenker, Ion Stoica (2010)
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Felix Halim, Stratos Idreos, Panagiotis Karras, R. Yap (2012)
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-StoresProc. VLDB Endow., 5
Dionysios Logothetis, Chris Trezzo, Kevin Webb, K. Yocum (2011)
In-situ MapReduce for Log Processing
(2010)
The performance of MapReduce: an in-depth study
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sarma, R. Murthy, H. Liu (2010)
Data warehousing and analytics infrastructure at facebookProceedings of the 2010 ACM SIGMOD International Conference on Management of data
H. Herodotou, S. Babu (2011)
Profiling, what-if analysis, and cost-based optimization of MapReduce programsProceedings of the VLDB Endowment, 4
Stratos Idreos, M. Kersten, S. Manegold (2009)
Self-organizing tuple reconstruction in column-storesProceedings of the 2009 ACM SIGMOD International Conference on Management of data
P. Cochat, L. Vaucoret, J. Sarles (2008)
Et alEvidence Based Mental Health, 11
Stratos Idreos, S. Manegold, Harumi Kuno, G. Graefe (2011)
Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-StoresProc. VLDB Endow., 4
Jorge-Arnulfo Quiané-Ruiz, C. Pinkel, Jörg Schad, J. Dittrich (2011)
RAFTing MapReduce: Fast recovery on the RAFT2011 IEEE 27th International Conference on Data Engineering
Michael Cafarella, C. Ré (2010)
Manimal: relational optimization for data-intensive programs
Stratos Idreos, M. Kersten, S. Manegold (2007)
Updating a cracked database
M. Lühring, K. Sattler, Karsten Schmidt, E. Schallehn (2007)
Autonomous Management of Soft Indexes2007 IEEE 23rd International Conference on Data Engineering Workshop
Stratos Idreos, Ioannis Alagiannis, Ryan Johnson, A. Ailamaki (2011)
Here are my Data Files. Here are my Queries. Where are my Results?
J. Dean, S. Ghemawat (2010)
MapReduce: a flexible data processing toolCommun. ACM, 53
Ioannis Alagiannis, Renata Borovica-Gajic, Miguel Branco, Stratos Idreos, A. Ailamaki (2012)
NoDB: efficient query execution on raw data filesProceedings of the 2012 ACM SIGMOD International Conference on Management of Data
G. Graefe, Felix Halim, Stratos Idreos, Harumi Kuno, S. Manegold (2012)
Concurrency Control for Adaptive IndexingProc. VLDB Endow., 5
Spyros Blanas, J. Patel, V. Ercegovac, Jun Rao, E. Shekita, Yuanyuan Tian (2010)
A comparison of join algorithms for log processing in MaPreduceProceedings of the 2010 ACM SIGMOD International Conference on Management of data
Karl Schnaitter, S. Abiteboul, T. Milo, N. Polyzotis (2006)
COLT: continuous on-line tuningProceedings of the 2006 ACM SIGMOD international conference on Management of data
S. Chaudhuri, Vivek Narasayya (1997)
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server
Songting Chen (2010)
Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduceProc. VLDB Endow., 3
J. Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Y. Kargin, Vinay Setty, Jörg Schad (2010)
Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)Proc. VLDB Endow., 3
S. Agrawal, S. Chaudhuri, L. Kollár, A. Marathe, Vivek Narasayya, Manoj Syamala (2005)
Database tuning advisor for microsoft SQL server 2005: demo
Hung-chih Yang, D. Parker (2009)
Traverse: Simplified Indexing on Large Map-Reduce-Merge Clusters
Hadoop MapReduce has evolved to an important industry standard for massive parallel data processing and has become widely adopted for a variety of use-cases. Recent works have shown that indexes can improve the performance of selective MapReduce jobs dramatically. However, one major weakness of existing approaches is high index creation costs. We present HAIL (Hadoop Aggressive Indexing Library), a novel indexing approach for HDFS and Hadoop MapReduce. HAIL creates different clustered indexes over terabytes of data with minimal, often invisible costs, and it dramatically improves runtimes of several classes of MapReduce jobs. HAIL features two different indexing pipelines, static indexing and adaptive indexing . HAIL static indexing efficiently indexes datasets while uploading them to HDFS. Thereby, HAIL leverages the default replication of Hadoop and enhances it with logical replication. This allows HAIL to create multiple clustered indexes for a dataset, e.g., one for each physical replica. Still, in terms of upload time, HAIL matches or even improves over the performance of standard HDFS. Additionally, HAIL adaptive indexing allows for automatic, incremental indexing at job runtime with minimal runtime overhead. For example, HAIL adaptive indexing can completely index a dataset as byproduct of only four MapReduce jobs while incurring an overhead as low as 11 % for the very first of those job only. In our experiments, we show that HAIL improves job runtimes by up to 68 $$\times $$ × over Hadoop. This article is an extended version of the VLDB 2012 paper (Dittrich et al. in PVLDB 5(11):1591–1602, 2012 ).
The VLDB Journal – Springer Journals
Published: Jun 1, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.