Efficient Program Analysis Using Dependence Flow Graphs
Johnson, Richard C.
Program analysis plays a major role in advanced compilers, yet traditional approaches to data flow analysis are quite time consuming. Prior techniques for speeding up data flow analysis have either exploited program structure or have used alternate "sparse" dependence representations to avoid performing unnecessary work. No general method exploiting both program structure and sparsity has emerged. We present a new framework for program analysis using dependence flow graphs (DFGs), an intermediate representation derived from dataflow machine graphs. DFGs integrate data and control dependences in a way that supports general and efficient program analysis while avoiding the problems of previous dependence representations. In particular, our approach simultaneously exploits program structure, sparsity of data flow systems, and data structure reuse to speed up analysis. At the heart of our method is a new program decomposition based on identifying single-entry single-exit (SESE) regions of the control flow graph (CFG). We show that finding SESE regions is equivalent to the problem of finding control dependence regions, and we develop optimal linear-time algorithms for these problems. The nesting relationship of SESE regions is recorded in a data structure called the program structure tree (PST). Since each SESE region is a control flow graph in its own right, we can speed up many CFG algorithms by applying them independently to each SESE region in a divide-and-conquer manner. Additionally, SESE regions can be categorized according to their local structure (e.g. as if-then-else or loop regions), and this local structure may be exploited by applying simple syntactic methods according to the region kind. Hierarchical program structure is used in solving global data flow problems by first solving local problems within progressively larger SESE regions. The global solution is then propagated to enclosed regions in a top-down pass over the PST. Sparsity is exploited within each data flow problem by avoiding propagation of data flow values through SESE regions that do not effect the data flow solution. We show that the sparsity found in common scalar optimizations is captured precisely by dependence flow graphs; the DFG is then viewed as a reusable "basis set" of sparse graphs. Since sparsity is based on SESE regions of the PST, we may exploit structure and sparsity simultaneously. By solving data flow systems using the DFG, we avoid the cost of rediscovering structure and sparsity for each problem instance. In this case, the cost of building the dependence flow graph is amortized over many related data flow problems. Such data structure reuse is essential to realizing the full potential of sparse data flow methods. However, reuse adds the additional burden of maintaining the DFG as the program is transformed during optimization. Using the PST, we derive practical incremental algorithms for updating both the DFG and the PST itself. We demonstrate the value of our approach through measurements taken from an optimizing FORTRAN compiler that uses the program structure tree and dependence flow graph as its internal representation. From our experiments, we conclude that sparse methods hold great potential for speeding up data flow analysis, but that reusing and incrementally updating a sparse representation is the key to realizing this potential.
computer science; technical report
Previously Published As