Analysis of enterprise IT service availability : Enterprise architecture modeling for assessment, prediction, and decision-making

Abstract: Information technology has become increasingly important to individuals and organizations alike. Not only does IT allow us to do what we always did faster and more effectively, but it also allows us to do new things, organize ourselves differently, and work in ways previously unimaginable. However, these advantages come at a cost: as we become increasingly dependent upon IT services, we also demand that they are continuously and uninterruptedly available for use. Despite advances in reliability engineering, the complexity of today's increasingly integrated systems offers a non-trivial challenge in this respect. How can high availability of enterprise IT services be maintained in the face of constant additions and upgrades, decade-long life-cycles, dependencies upon third-parties and the ever-present business-imposed requirement of flexible and agile IT services?The contribution of this thesis includes (i) an enterprise architecture framework that offers a unique and action-guiding way to analyze service availability, (ii) identification of causal factors that affect the availability of enterprise IT services, (iii) a study of the use of fault trees for enterprise architecture availability analysis, and (iv) principles for how to think about availability management.This thesis is a composite thesis of five papers. Paper 1 offers a framework for thinking about enterprise IT service availability management, highlighting the importance of variance of outage costs. Paper 2 shows how enterprise architecture (EA) frameworks for dependency analysis can be extended with Fault Tree Analysis (FTA) and Bayesian networks (BN) techniques. FTA and BN are proven formal methods for reliability and availability modeling. Paper 3 describes a Bayesian prediction model for systems availability, based on expert elicitation from 50 experts. Paper 4 combines FTA and constructs from the ArchiMate EA language into a method for availability analysis on the enterprise level. The method is validated by five case studies, where annual downtime estimates were always within eight hours from the actual values. Paper 5 extends the Bayesian prediction model from paper 3 and the modeling method from paper 4 into a full-blown enterprise architecture framework, expressed in a probabilistic version of the Object Constraint Language. The resulting modeling framework is tested in nine case studies of enterprise information systems.