Making reliable distributed systems in the presence of software errors

University dissertation from Kista : Mikroelektronik och informationsteknik

Abstract: The work described in this thesis is the result of aresearch program started in 1981 to find better ways ofprogramming Telecom applications. These applications are largeprograms which despite careful testing will probably containmany errors when the program is put into service. We assumethat such programs do contain errors, and investigate methodsfor building reliable systems despite such errors.The research has resulted in the development of a newprogramming language (called Erlang), together with a designmethodology, and set of libraries for building robust systems(called OTP). At the time of writing the technology describedhere is used in a number of major Ericsson, and Nortelproducts. A number of small companies have also been formedwhich exploit the technology.The central problem addressed by this thesis is the problemof constructing reliablesystems from programs which maythemselves contain errors. Constructing such systems imposes anumber of requirements on any programming language that is tobe used for the construction. I discuss these languagerequirements, and show how they are satisfied by Erlang.Problems can be solved in a programming language, or in thestandard libraries which accompany the language. I argue howcertain of the requirements necessary to build a fault-tolerantsystem are solved in the language, and others are solved in thestandard libraries. Together these form a basis for buildingfault-tolerant software systems.No theory is complete without proof that the ideas work inpractice. To demonstrate that these ideas work in practice Ipresent a number of case studies of large commerciallysuccessful products which use this technology. At the time ofwriting the largest of these projects is a major Ericssonproduct, having over a million lines of Erlang code. Thisproduct (the AXD301) is thought to be one of the most reliableproducts ever made by Ericsson.Finally, I ask if the goal of finding better ways to programTelecom applications was fulfilled --- I also point to areaswhere I think the system could be improved.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.