Characterizing cloud computing hardware reliability
Vishwanath, Kashi Venkatesh; Nagappan, Nachiappan
2010-06-10 00:00:00
Characterizing Cloud Computing Hardware Reliability Kashi Venkatesh Vishwanath and Nachiappan Nagappan Microsoft Research One Microsoft Way, Redmond WA 98052 {kashi.vishwanath,nachin}@microsoft.com ABSTRACT Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliver highly available cloud computing services. These servers consist of multiple hard disks, memory modules, network cards, processors etc., each of which while carefully engineered are capable of failing. While the probability of seeing any such failure in the lifetime (typically 3-5 years in industry) of a server can be somewhat small, these numbers get magni Âed across all devices hosted in a datacenter. At such a large scale, hardware component failure is the norm rather than an exception. Hardware failure can lead to a degradation in performance to end-users and can result in losses to the business. A sound understanding of the numbers as well as the causes behind these failures helps improve operational experience by not only allowing us to be better equipped to tolerate failures but also to bring down the hardware cost through engineering, directly leading to a saving for the company. To the best of our knowledge, this paper is the Ârst attempt to study server failures
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pnghttp://www.deepdyve.com/lp/association-for-computing-machinery/characterizing-cloud-computing-hardware-reliability-tZp6FqyuZl
Characterizing Cloud Computing Hardware Reliability Kashi Venkatesh Vishwanath and Nachiappan Nagappan Microsoft Research One Microsoft Way, Redmond WA 98052 {kashi.vishwanath,nachin}@microsoft.com ABSTRACT Modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliver highly available cloud computing services. These servers consist of multiple hard disks, memory modules, network cards, processors etc., each of which while carefully engineered are capable of failing. While the probability of seeing any such failure in the lifetime (typically 3-5 years in industry) of a server can be somewhat small, these numbers get magni Âed across all devices hosted in a datacenter. At such a large scale, hardware component failure is the norm rather than an exception. Hardware failure can lead to a degradation in performance to end-users and can result in losses to the business. A sound understanding of the numbers as well as the causes behind these failures helps improve operational experience by not only allowing us to be better equipped to tolerate failures but also to bring down the hardware cost through engineering, directly leading to a saving for the company. To the best of our knowledge, this paper is the Ârst attempt to study server failures
To get new article updates from a journal on your personalized homepage, please log in first, or sign up for a DeepDyve account if you don’t already have one.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.