We present a transformational system for extracting parallelism from programs. Our transformations generate code for synchronous parallel computers, such as Very Long Instruction Word and pipelined machines. The transformational system, which is based on percolation scheduling,is simple and uniform. There are four primitive transformations-three that perform code motion plus loop unrolling-from which all parallelizing algorithms are constructed. Our transformations are studied as a formal system. We define a formal measure of program improvement, and show that our transformations improve programs with respect to the measure. This formal approach allows a number of results on the expressive power of our transformations. Most importantly, we show that it is possible to compute limits of infinite sequences of the primitive transformations. This leads to a number of new algorithms for software pipelining, including: an algorithm that generates optimal code for loops without tests, an algorithm for software pipelining of multiple nested loops, and a general solution to the problem of software pipelining in the presence of tests. Using the four primitives and the limit-taking transformation, it is possible to express the classical parallelization techniques for vector, multiprocessor, and VLIW machines, such as doacross, the wavefront method, loop interchange, trace scheduling, and a simple form of vectorization. Thus, our transformational system can be viewed as a formal foundation for the area of parallelization.
computer science; technical report
Previously Published As