Load Balanced FFT Implementations on the Intel iPSC
Two implementations of both the Cooley-Tukey and Gentelman-Sande radix-two FFT algorithms are described where the distribution of computational work among the processors is balanced, i.e. every processor does the same number of complex multiplications and additions. One method requires no extra storage space for the buffering of the complex data that is to be exchanged between processors therefore allowing "in-place" computation. The second implementation is useful for doing FFT's of symmetric sequences and an example of its use in the sine transform is presented. We also show how to simulate the second implementation in terms of the first to gain "in-place" computation without the need for buffering and intra-processor swapping.