Skeleton Programming for Heterogeneous GPU-based Systems
Abstract: In this thesis, we address issues associated with programming modern heterogeneous systems while focusing on a special kind of heterogeneous systems that include multicore CPUs and one or more GPUs, called GPU-based systems.We consider the skeleton programming approach to achieve high level abstraction for efficient and portable programming of these GPU-based systemsand present our work on SkePU library which is a skeleton library for these systems.We extend the existing SkePU library with a two-dimensional (2D) data type and skeleton operations and implement several new applications using newly made skeletons. Furthermore, we consider the algorithmic choice present in SkePU and implement support to specify and automatically optimize the algorithmic choice for a skeleton call, on a given platform.To show how to achieve performance, we provide a case-study on optimized GPU-based skeleton implementation for 2D stencil computations and introduce two metrics to maximize resource utilization on a GPU. By devising a mechanism to automatically calculate these two metrics, performance can be retained while porting an application from one GPU architecture to another.Another contribution of this thesis is implementation of the runtime support for the SkePU skeleton library. This is achieved with the help of the StarPUruntime system. By this implementation,support for dynamic scheduling and load balancing for the SkePU skeleton programs is achieved. Furthermore, a capability to do hybrid executionby parallel execution on all available CPUs and GPUs in a system, even for a single skeleton invocation, is developed.SkePU initially supported only data-parallel skeletons. The first task-parallel skeleton (farm) in SkePU is implemented with support for performance-aware scheduling and hierarchical parallel execution by enabling all data parallel skeletons to be usable as tasks inside the farm construct.Experimental evaluations are carried out and presented for algorithmic selection, performance portability, dynamic scheduling and hybrid execution aspects of our work.
CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)