Fault-Tolerant Management of Distributed Applications Using the Reactive System Architecture
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Distributed applications are becoming increasingly pervasive, and difficult to manage. Examples of distributed applications include operating system servers and clients on a network, programs performing distributed computations, and systems constructed by integrating stand-alone programs. This thesis argues that distributed applications can be managed efficiently by using a reactive system architecture. A reactive system consists of a control component continuously responding to changes in an environment component. This structure is applied to distributed application management by casting the programs making up the application as the environment and super-imposing a layer of control. By acting upon conditions sensed in the environment, the control layer can respond to changes in the distributed application, ensuring that it functions in a well-behaved manner. This thesis also presents the Meta toolkit, which provides primitives for controlling distributed applications using the reactive system architecture. The application components are instrumented with sensors and actuators - routines that respectively read and modify the application state. Control of the application is carried out via guarded commands, which are distributed for execution by either stubs coresident with programs in the application or by special servers. Distributing the control program results in greater responsiveness and efficiency but requires certain consistency problems to be addressed. Furthermore, the Meta toolkit supports fault-tolerant execution of guarded commands through the use of replicated servers. This toolkit has been implemented and is completely functional, and this thesis contains extensive performance figures for the toolkit.