Shared Resources in Distributed Systems: Analytical Tools for Evaluation and Self-stabilizing Provisioning

Abstract: Distributed computing is an established computing paradigm of modern computing systems.The nodes of a distributed system interact either by sharing resources or via a communication network. In both cases, provisioning of shared resources is a challenge, for example when resource demand and supply varies or when the system is prone to failures. Analytical tools for evaluating system performance and for provisioning shared resources enhance system design and implementations.In this thesis, we develop analytical tools for the evaluation and self-stabilizing provisioning of shared-resources in distributed systems. We first focus on systems where resource demand and supply varies, and study cases of reusable and non-reusable resources. We study shared-object systems, where system nodes demand mutually exclusive access to a number of objects in a continuous fashion. We develop analytical tools for computing the expected delay and throughput of such systems, in a wide range of system utilization scenarios, including saturation points. Moreover, we study systems where nodes share energy resources, and focus on optimizing the available resources on a system-level. We develop online algorithms that use the flexibility on resource demand, to optimize the utilization of the available supply, and prove their competitive ratios.Recovery from failures is necessary for provisioning shared resources. Dynamic and complex systems are often designed based on a failure model, but it is important that they recover even after the occurrence of unexpected failures, outside the failure model. Such failures can include topological changes in the network, stale information in the nodes' memory, communication failures, etc. These failures are further amplified by the system's asynchrony. In these settings, we first focus on provisioning of network resources, in terms of network control and ordering of distributed events. We study Software-Defined Networks (SDNs) and specifically their control planes. We provide a self-stabilizing distributed algorithm for a fault-tolerant SDN control plane, that deals with communication failures, topological changes, as well as, with transient faults, that can bring the system in an arbitrary state. Moreover, we focus on ordering distributed events in asynchronous message-passing systems, in the absence of execution fairness. In these extreme asynchronous settings, we provide a practically-self-stabilizing distributed algorithm, that uses bounded memory and yet, can tolerate concurrent counter overflows, when counting distributed events, as well as transient faults.