Analysing Fault Tolerance for Erlang Applications

University dissertation from Uppsala : Acta Universitatis Upsaliensis

Abstract: ERLANG is a concurrent functional language, well suited for distributed, highly concurrent and fault-tolerant software. An important part of Erlang is its support for failure recovery. Fault tolerance is provided by organising the processes of an ERLANG application into tree structures. In these structures, parent processes monitor failures of their children and are responsible for their restart. Libraries support the creation of such structures during system initialisation.A technique to automatically analyse that the process structure of an ERLANG application from the source code is presented. The analysis exposes shortcomings in the fault tolerance properties of the application. First, the process structure is extracted through static analysis of the initialisation code of the application. Thereafter, analysis of the process structure checks two important properties of the fault handling mechanism: 1) that it will recover from any process failure, 2) that it will not hide persistent errors.The technique has been implemented in a tool, and applied it to several OTP library applications and to a subsystem of a commercial system the AXD 301 ATM switch.The static analysis of the ERLANG source code is achieved through symbolic evaluation. The evaluation is peformed according to an abstraction of ERLANG’s actual semamtics. The actual semantics is formalised for a nontrivial part of the language and it is proven that the abstraction of the semantics simulates the actual semantics.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.