A short introduction to failure detectors for asynchronous distributed systems
ACM SIGACT News Distributed Computing Column 17 Sergio Rajsbaum Abstract The Distributed Computing Column covers the theory of systems that are composed of a number of interacting computing elements. These include problems of communication and networking, databases, distributed shared memory, multiprocessor architectures, operating systems, veri cation, Internet, and the Web. This issue consists of: ¢ A Short Introduction to Failure Detectors for Asynchronous Distributed Systems, an introductory survey by Michel Raynal for readers who want to quickly understand the aim, the basic principles, the power and limitations of the failure detector concept. Many thanks to Michel for his contribution to this issue. Request for Collaborations: Please send me any suggestions for material I should be including in this column, including news and communications, open problems, and authors willing to write a guest column or to review an event related to theory of distributed computing. A Short Introduction to Failure Detectors for Asynchronous Distributed Systems Michel Raynal Abstract Since the rst version of Chandra and Toueg s seminal paper titled Unreliable failure detectors for reliable distributed systems in 1991, the failure detector concept has been extensively studied and investigated. This is not at all surprising as failure detection is pervasive in the design, the analysis and the implementation of a lot of fault-tolerant distributed algorithms that constitute the core of distributed system middleware. The literature on this topic is mostly technical and appears mainly in theoretically inclined journals and conferences. The aim of this paper is to o er an introductory survey to the failure detector concept for readers who are not familiar with it and want to quickly understand its aim, its basic principles, its power and limitations. To attain this goal, the paper rst describes the motivations that underlie the concept, and then surveys...