Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Integrating hadoop and parallel DBMs

Integrating hadoop and parallel DBMs Integrating Hadoop and Parallel DBMS Yu Xu — Pekka Kostamaa — — — — — Like Gao — Teradata San Diego, CA, USA and El Segundo, CA, USA {yu.xu,pekka.kostamaa,like.gao}@teradata.com ABSTRACT Teradata ™s parallel DBMS has been successfully deployed in large data warehouses over the last two decades for large scale business analysis in various industries over data sets ranging from a few terabytes to multiple petabytes. However, due to the explosive data volume increase in recent years at some customer sites, some data such as web logs and sensor data are not managed by Teradata EDW (Enterprise Data Warehouse), partially because it is very expensive to load those extreme large volumes of data to a RDBMS, especially when those data are not frequently used to support important business decisions. Recently the MapReduce programming paradigm, started by Google and made popular by the open source Hadoop implementation with major support from Yahoo!, is gaining rapid momentum in both academia and industry as another way of performing large scale data analysis. By now most data warehouse researchers and practitioners agree that both parallel DBMS and MapReduce paradigms have advantages and disadvantages for various business applications and thus both paradigms are http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Integrating hadoop and parallel DBMs

Association for Computing Machinery — Jun 6, 2010

Loading next page...
/lp/association-for-computing-machinery/integrating-hadoop-and-parallel-dbms-lenPy8jInc
Datasource
Association for Computing Machinery
Copyright
Copyright © 2010 by ACM Inc.
ISBN
978-1-4503-0032-2
doi
10.1145/1807167.1807272
Publisher site
See Article on Publisher Site

Abstract

Integrating Hadoop and Parallel DBMS Yu Xu — Pekka Kostamaa — — — — — Like Gao — Teradata San Diego, CA, USA and El Segundo, CA, USA {yu.xu,pekka.kostamaa,like.gao}@teradata.com ABSTRACT Teradata ™s parallel DBMS has been successfully deployed in large data warehouses over the last two decades for large scale business analysis in various industries over data sets ranging from a few terabytes to multiple petabytes. However, due to the explosive data volume increase in recent years at some customer sites, some data such as web logs and sensor data are not managed by Teradata EDW (Enterprise Data Warehouse), partially because it is very expensive to load those extreme large volumes of data to a RDBMS, especially when those data are not frequently used to support important business decisions. Recently the MapReduce programming paradigm, started by Google and made popular by the open source Hadoop implementation with major support from Yahoo!, is gaining rapid momentum in both academia and industry as another way of performing large scale data analysis. By now most data warehouse researchers and practitioners agree that both parallel DBMS and MapReduce paradigms have advantages and disadvantages for various business applications and thus both paradigms are

There are no references for this article.