Modeling and Control for Improved Predictability of Cloud Applications

Abstract: Cloud computing has emerged as a key technology in the latest decade and continues to be applied to manage the computing needs of new domains. As a result, the requirements on predictable behavior in the cloud increase, thus the hosted applications need to be recognized as both fault-tolerant and responsive even under difficult conditions.In this thesis, new modeling methods and decision-making strategies are presented with the goal of increasing the predictability of cloud applications. The methods can be divided into two tracks, using concepts from control theory and queuing theory respectively. The control-theoretical method track utilizes the concept of graceful degradation as an enabling actuator. In the context of server control, a novel dynamic model for queue lengths is proposed, as well as a cascaded structure for response time control. Additionally, interactions between decision-making strategies at different layers in the cloud infrastructure are discussed, including an interpretation of the popular Join-Shortest-Queue (JSQ) load-balancing strategy as a queue length controller. The queuing-theoretical track utilizes the concept of request cloning to increase the predictability of applications replicated across multiple servers. A criterion for synchronized service is formalized, which enables a dramatic simplification of modeling of applications subject to cloning, without requiring any further assumptions on neither queuing disciplines nor on the statistical distributions involved. Furthermore, model error bounds are derived for server systems that break the synchronized service criterion. It is shown that imperfections that can arise during implementation, only slightly affect the accuracy of the model. Finally, an intuitive explanation is given for why the popular JSQ load-balancing strategy acts as a service synchronizer, that allows for accurate, approximate modeling of the complicated scenario of unrestricted request cloning across replicated servers where the JSQ strategy is used for load-balancing.While there are differences in the modeling approaches between the two separate method tracks, they both share common properties that run throughout the thesis. First, the majority of the involved techniques revolve around finding design choices that enable simplification, without limiting the applicability of the solutions. Second, many of the strategies presented in the thesis apply concepts and structures traditionally used in different domains, which often requires the problems to be viewed from a slightly different angle. The proposed models and methods from both tracks are evaluated in a simulated cloud environment, composed of a discrete-event simulator implemented in a request-by-request fashion, independent of the proposed methods in this thesis.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)