Parallel Algorithms and Library Software for the Generalized Eigenvalue Problem on Distributed Memory Computer Systems

University dissertation from Umeå : Umeå universitet

Abstract: In this thesis, we present and discuss algorithms and library software for solving thegeneralized non-symmetric eigenvalue problem(GNEP) on high performance computing(HPC) platforms with distributed memory. GNEPs occurs frequently when solving many different types of problems, and our solution makes it possible to solve these problems fast and accurate in parallel using state-of-the-art HPC systems. We present a solution to the GNEP as solving Ax = yBx (x != 0); where A and B are real square matrices, using a two-stage method. In the first stage the matrix pair (A;B) is reduced to a Hessenberg-triangular(H;T) pair in a finite number of steps. The second stage reduces the matrix pair further to generalized Schur form using an iterative method which forms a Schur-triangular(S;T) pair. Once the (S;T) pair is derived, desired eigenvalues and eigenvectors associated with the matrices(A;B) can easily be computed. The algothitm for the first stage uses low level operations to perform the actual reduction. However, during the reduction an delayed update technique is used so that several reductions are accumulated before they later are applied in a blocked fashion which together with a task scheduler makes the algorithm scale when running in a parallel setup. The potential presence of infinte eigenvalues requires the algorithm for the second stage to continously scan for and robustly deflate infinte eigenvalues so that they do not intefere with other real eigenvalues or gets misinterpreted for real entires. Two other topics covered for the second stage are the aggresive early deflation, which radically speeds up the convergance towards the generalized Schur form, and the usage of several idenpendent chains of tightly coupled bulges which makes the algorithm parallely scalable. The algoithms have been evaluated on several HPC platforms, and performance is demonstrated and evaluated using up to 1600 CPU cores for problems with matrices as large as 100000 x 100000. Software related to the reduction is described in a User guide, so end users can build and use the software on their own HPC platforms. To ensure proper usage of the software the calling sequence for the main driver routines with input and output parameters are described in detail. The software is, optionally, tunable via a set of parameters for various thresholds and buffer sizes etc. These parameters are listed and discussed, and for each a recommended value is given which should give a reasonables performance on systems similar to the ones we have been running on.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.