Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Characterizing cloud computing hardware reliability

Characterizing cloud computing hardware reliability Characterizing Cloud Computing Hardware Reliability Kashi Venkatesh Vishwanath and Nachiappan Nagappan Microsoft Research One Microsoft Way, Redmond WA 98052 {kashi.vishwanath,nachin}@microsoft.com ABSTRACT Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliver highly available cloud computing services. These servers consist of multiple hard disks, memory modules, network cards, processors etc., each of which while carefully engineered are capable of failing. While the probability of seeing any such failure in the lifetime (typically 3-5 years in industry) of a server can be somewhat small, these numbers get magni ed across all devices hosted in a datacenter. At such a large scale, hardware component failure is the norm rather than an exception. Hardware failure can lead to a degradation in performance to end-users and can result in losses to the business. A sound understanding of the numbers as well as the causes behind these failures helps improve operational experience by not only allowing us to be better equipped to tolerate failures but also to bring down the hardware cost through engineering, directly leading to a saving for the company. To the best of our knowledge, this paper is the rst attempt to study server failures http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Characterizing cloud computing hardware reliability

Association for Computing Machinery — Jun 10, 2010

Loading next page...
 
/lp/association-for-computing-machinery/characterizing-cloud-computing-hardware-reliability-tZp6FqyuZl

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Datasource
Association for Computing Machinery
Copyright
The ACM Portal is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
ISBN
978-1-4503-0036-0
doi
10.1145/1807128.1807161
Publisher site
See Article on Publisher Site

Abstract

Characterizing Cloud Computing Hardware Reliability Kashi Venkatesh Vishwanath and Nachiappan Nagappan Microsoft Research One Microsoft Way, Redmond WA 98052 {kashi.vishwanath,nachin}@microsoft.com ABSTRACT Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliver highly available cloud computing services. These servers consist of multiple hard disks, memory modules, network cards, processors etc., each of which while carefully engineered are capable of failing. While the probability of seeing any such failure in the lifetime (typically 3-5 years in industry) of a server can be somewhat small, these numbers get magni ed across all devices hosted in a datacenter. At such a large scale, hardware component failure is the norm rather than an exception. Hardware failure can lead to a degradation in performance to end-users and can result in losses to the business. A sound understanding of the numbers as well as the causes behind these failures helps improve operational experience by not only allowing us to be better equipped to tolerate failures but also to bring down the hardware cost through engineering, directly leading to a saving for the company. To the best of our knowledge, this paper is the rst attempt to study server failures

There are no references for this article.