Scientific Computing on Multicore Architectures

University dissertation from Uppsala : Acta Universitatis Upsaliensis

Abstract: Computer simulations are an indispensable tool for scientists to gain new insights about nature. Simulations of natural phenomena are usually large, and limited by the available computer resources. By using the computer resources more efficiently, larger and more detailed simulations can be performed, and more information can be extracted to help advance human knowledge.The topic of this thesis is how to make best use of modern computers for scientific computations. The challenge here is the high level of parallelism that is required to fully utilize the multicore processors in these systems.Starting from the basics, the primitives for synchronizing between threads are investigated. Hardware transactional memory is a new construct for this, which is evaluated for a new use of importance for scientific software: atomic updates of floating point values. The evaluation includes experiments on real hardware and comparisons against standard methods.Higher level programming models for shared memory parallelism are then considered. The state of the art for efficient use of multicore systems is dynamically scheduled task-based systems, where tasks can depend on data. In such systems, the software is divided up into many small tasks that are scheduled asynchronously according to their data dependencies. This enables a high level of parallelism, and avoids global barriers.A new system for managing task dependencies is developed in this thesis, based on data versioning. The system is implemented as a reusable software library, and shown to be as efficient or more efficient than other shared-memory task-based systems in experimental comparisons.The developed runtime system is then extended to distributed memory machines, and used for implementing a parallel version of a software for global climate simulations. By running the optimized and parallelized version on eight servers, an equally sized problem can be solved over 100 times faster than in the original sequential version. The parallel version also allowed significantly larger problems to be solved, previously unreachable due to memory constraints.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)