Locality-aware Scheduling and Characterization of Task-based Programs

University dissertation from Stockholm : KTH Royal Institute of Technology

Abstract: Modern computer architectures expose an increasing number of parallel features supported by complex memory access and communication structures. Currently used task scheduling techniques perform poorly since they focus solely on balancing computation load across parallel features and remain oblivious to locality properties of support structures. We contribute with locality-aware task scheduling mechanisms which improve execution time performance on average by 44\% and 11\% respectively on two locality-sensitive architectures - the Tilera TILEPro64 manycore processor and an AMD Opteron 6172 processor based four socket SMP machine.Programmers need task performance metrics such as amount of task parallelism and task memory hierarchy utilization to analyze performance of task-based programs. However, existing tools indicate performance mainly using thread-centric metrics. Programmers therefore resort to using low-level and tedious thread-centric analysis methods to infer task performance. We contribute with tools and methods to characterize task-based OpenMP programs at the level of tasks using which programmers can quickly understand important properties of the task graph such as critical path and parallelism as well as properties of individual tasks such as instruction count and memory behavior.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)