Exploring Software Resilience

Abstract: Software has, for better or worse, become a core component in the structured management and manipulation of vast quantitates of information, and is therefore central to many crucial services and infrastructures. However, hidden among the various benefits that the inclusion of software may bring is the potential of unwanted and unforeseen interactions, ranging from mere annoyances all the way up to full-blown catastrophes. Overcoming adversities of this nature is a challenge shared with other engineering ventures, and there are many developed strategies that work towards eliminating various kinds of disturbances, assuming that it is possible to apply such strategies correctly. One approach in this regard, is is to accept some anomalous behaviors as mere facts of life and make sure that the situations experienced are dealt with in an expeditious manner, while at the same time trying to discover, implement and improve safe-guards that can lessen adverse consequences in the event of future problems; in short, to embed resilience. The work described in this thesis explores the foundations of software resilience, and thus covers the main resilience-enabling mechanisms, along with supporting tools, techniques and methods used to embed resilience. These instruments are dissected and analyzed from the perspective of stakeholders that have to operate on pre-existing, critical, large and heterogeneous subjects that are to some extent already up and running at the point of instrumentation. Finally, in the course of describing this subject, the thesis describes a demonstrator environment for self-healing activities in a partially damaged power grid, its construction details and the initial results of the study conducted in this environment.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.