Data-driven Performance Prediction and Resource Allocation for Cloud Services
Abstract: Cloud services, which provide online entertainment, enterprise resource management, tax filing, etc., are becoming essential for consumers, businesses, and governments. The key functionalities of such services are provided by backend systems in data centers. This thesis focuses on three fundamental problems related to management of backend systems. We address these problems using data-driven approaches: triggering dynamic allocation by changes in the environment, obtaining configuration parameters from measurements, and learning from observations. The first problem relates to resource allocation for large clouds with potentially hundreds of thousands of machines and services. We developed and evaluated a generic gossip protocol for distributed resource allocation. Extensive simulation studies suggest that the quality of the allocation is independent of the system size for the management objectives considered.The second problem focuses on performance modeling of a distributed key-value store, and we study specifically the Spotify backend for streaming music. We developed analytical models for system capacity under different data allocation policies and for response time distribution. We evaluated the models by comparing model predictions with measurements from our lab testbed and from the Spotify operational environment. We found the prediction error to be below 12% for all investigated scenarios.The third problem relates to real-time prediction of service metrics, which we address through statistical learning. Service metrics are learned from observing device and network statistics. We performed experiments on a server cluster running video streaming and key-value store services. We showed that feature set reduction significantly improves the prediction accuracy, while simultaneously reducing model computation time. Finally, we designed and implemented a real-time analytics engine, which produces model predictions through online learning.
CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)