Access Normalization: Loop Restructuring for NUMA Compilers
Li, Wei; Pingali, Keshav
A common feature of many scalable parallel machines is non-uniform memory access - a processor can access data in its local memory ten to a thousand times faster than it can access local data. In addition, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run well on such machines, software must exploit these features. We believe it is too onerous for a programmer to do this by hand, so we have been exploring the use of restructuring compiler technology for this purpose. In this paper, we start with a language like FORTRAN-D with user-specified data distributions and develop a systematic loop transformation strategy called access normalization that restructures loop nests to exploit both locality and block transfers whenever possible. We demonstrate the power of our techniques using routines from the BLAS (Basic Linear Algebra Subprograms) library. Our loop transformation strategy is expressed in the framework of invertible matrcies and integer lattice theory, and it is an important generalization of Banerjee's framework of unimodular matrices.
Previously Published As