QR Factorization Algorithms for Coarse-Grained Distributed Systems
Bischof, Christian H.
We present the techniques of adaptive blocking and incremental condition estimation which we believe to be useful for the computation of common matrix decompositions in high-performance environments. We apply these new techniques to algorithms for computing the Householder QR factorization with and without pivoting on a coarse-grained distributed system. For reasons of portability, we use a pipelined scheme on a ring of processors as the basis of our algorithms. To take advantage of possible floating point hardware on each node we develop a blocked version of the pipelined Householder QR algorithm that employs the compact WY representation for products of Householder matrices. While a strategy involving blocks of fixed width leads to increased floating point utilization per node, it also leads to increased load imbalance. To reconcile this tradeoff we introduce a variable width block strategy based on a model of the critical path of the algorithm. The resulting adaptive blocking strategy provides for good floating point performance per node while maintaining overall load balance. Experimental results on the Intel iPSC hypercube show that the adaptive blocking strategy performs indeed better than any fixed width blocking strategy. In the second part of our thesis we develop methods for introducing pivoting into the distributed QR factorization algorithm. Incorporating the traditional column pivoting strategy in a straightforward manner introduces a global synchronization constraint which results in increased communication overhead. A strictly local pivoting scheme avoids the resulting loss in efficiency, but has to be monitored for reliability. To this end, we introduce an incremental condition estimator which allows us to update the estimate of the smallest singular value of an upper triangular matrix $R$ as new columns are added to $R$. The update requires only $O(n)$ flops an the storage of $O(n)$ words between successive steps. Experiments indicate that the incremental condition estimator is reliable despite its small computational cost. Using the incremental condition estimator we are then able to guard against the selection of troublesome pivot columns in our local pivoting scheme at little extra cost. Simulation results show that the resulting algorithm is about as reliable as the traditional QR factorization algorithm with column pivoting.
computer science; technical report
Previously Published As