Bischof, Christian H.2007-04-232007-04-231987-09http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR87-869https://hdl.handle.net/1813/6709Jacobi methods for computing the singular value decomposition (SVD) of a matrix are ideally suited for multiprocessor environments due to their inherent parallelism. In this paper we show how a block version of the two-sided Jacobi method can be used to compute the SVD efficiently on a distributed architecture. We compare two variants of this method that differ mainly in the degree to which they diagonalize a given subproblem. The first method is a true block generalization of the scalar scheme in that each off-diagonal block is completely annihilated. The second method is a scalar Jacobi algorithm reorganized in such a manner that it conforms to the block decomposition of the problem. We have performed experiments on the Loosely Coupled Array Processor (LCAP) system at IBM Kingston which for the purposes of this article can be viewed as a ring of up to ten FPS-164/MAX array processors. These experiments show that the block Jacobi algorithm performs well on a distributed system, especially when the processors have vector processing hardware. As an example, we were able to achieve a sustained performance of 159 MFlops on a 960-by-720 SVD problem using eight processors. A surprising outcome of these experiments is that the determining factor for the performance of the algorithm on a high-performance architecture is the subproblem solver, not the communication overhead of the algorithm.1774747 bytes473469 bytesapplication/pdfapplication/postscripten-UScomputer sciencetechnical reportComputing the Singular Value Decomposition on a Distributed System of Vector Processorstechnical report