Capacity allocation mechanisms for grid environments

University dissertation from Umeå : Datavetenskap

Abstract: During the past decade, Grid computing has gained popularity as a means to build powerful computing infrastructures by aggregating distributed computing capacity. Grid technology allows computing resources that belong to different organizations to be integrated into a single unified system image – a Grid. As such, Grid technology constitutes a key enabler of large-scale, crossorganizational sharing of computing resources. An important objective for the Virtual Organizations (VOs) that result from such sharing is to tame the distributed capacity of the Grid in order to manage it and make fair and efficient use of the pooled computing resources.Most Grids to date have, however, been completely unregulated, essentially serving as a “source of free CPU cycles” for authorized Grid users. Whenever unrestricted access is admitted to a shared resource there is a risk of overexploitation and degradation of the common resource, a phenomenon often referred to as “the tragedy of the commons”. This thesis addresses this problem by presenting two complementary Grid capacity allocation systems that allow the aggregate computing capacity of a Grid to be divided between users in order to protect the Grid from overuse while delivering fair service that satisfies the individual computational needs of different user groups.These two Grid capacity allocation mechanisms constitute the core contribution of this thesis. The first mechanism, the SweGrid Accounting System (SGAS), addresses the need for coordinated soft, real-time quota enforcement across Grid sites. The SGAS project was an early adopter of the serviceoriented principles that are now common practice in the Grid community, and the system has been tested in the Swegrid production environment. Furthermore, SGAS has been included in the Globus Toolkit, the de-facto standard Grid middleware toolkit. SGAS employs a credit-based allocation model where research projects are granted quota allowances that can be spent across the Grid resources, which charge users for their resource consumption. This enforcement of usage limits thus produces real-time overuse protection.The second approach, employed by the Fair Share Grid (FSGrid) system, uses a share-based allocation model where project entitlements are expressed in terms of hierarchical share policies that logically divide the Grid capacity between user groups. By coordinating local job scheduling to maintain these global capacity shares, the Grid resources collectively strive to schedule users for a “share of the Grid”. We refer to this cooperative scheduling model as decentralized Grid-wide fairshare scheduling.