foundations for system architectures and algorithms for creating truly robust autonomic systems -- systems that
are able to recover automatically from unexpected failures. Our approaches complement each other starting with
the case of given black-box systems, continuing with the process of developing new systems, and concluding
with the case of automatic creation of recovery-oriented software.
In the first part we consider software packages to be black boxes. We propose modeling software package flaws (bugs)
by assuming eventual Byzantine behavior of the package. A general, yet practical, framework and paradigm for the
monitoring and recovery of systems called autonomic recoverer is proposed. The framework receives task specific
requirements in the form of safety and liveness predicates and recovery actions. The autonomic recoverer uses a new
scheme for liveness assurance via on-line monitoring that complements known schemes for on-line safety assurance.
In the second part we consider a software package to be a transparent box and introduce the recovery oriented programming:
programs will include important safety and liveness properties and recovery actions as an integral part of the program.
We design a pre-compiler that produces augmented code for monitoring the properties and executing the recovery actions
upon a property violation. Assuming the restartability property of a given program, the resulting code is able to overcome
safety and liveness violations. We provide a correctness proof scheme for proving that the code produced by the
pre-compiler from the program code combined with the important properties and recovery actions fulfills its specifications
when started in an arbitrary state.
Finally, in the third part we consider a highly dynamic environment, which typically implies that there are no realizable
specifications for the environment, i.e., there does not exist a program that respects the specifications for every given
environment. In such cases the predefined recovery action may not suffice and a dramatic change in the program is required.
We suggest to search for a program in run time by trying all possible programs on plant replicas in parallel, where the plant
is the relevant part of the environment. We present control search algorithms for various settings plant state settings
(reflection and ability to set plant to a certain state).