Locality Enhancement Of Imperfectly-nested Loop Nests
Most numerical applications using arrays require extensive program transformation in order to perform well on current machine architectures with deep memory hierarchies. These transformations ensure that an execution of the application exploits data-locality and uses the caches more effectively. The problem of exploiting data-locality is well understood only for a small class of applications -- for programs in which all statements are present in the innermost loop of a loop-nest (called perfectly-nested loops). For such programs, statement instances can be mapped to an integer lattice (called the iteration space), and important transformations can be modelled as unimodular transformations of the iteration space. This framework has permitted the systematic application of transformations like loop-permutation, skewing and tiling in order to enhance locality in perfectly-nested loops. In dealing with programs that do not fall into this category, current compilers resort to ad-hoc techniques to find the right sequence of transformations. For some important benchmarks, no technique is known that will discover the right sequence of transformations. In my thesis, I propose a technique that extends the framework for perfectly-nested loops to general programs. The key idea is to embed the iteration space of every statement in the program into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so this embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, embeddings are chosen carefully to enhance locality. The product space is then transformed further using unimodular transformations, after which fully permutable loops are tiled, and code is generated. Code can also be generated to emulate block-recursive versions of the original program. I demonstrate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPECfp95 benchmark suite.
computer science; technical report
Previously Published As