Area-Universal Interconnection Networks for VLSI Parallel Computers
A central issue in the design of a general-purpose parallel computer is the choice of an interconnection network and an associated algorithm for routing messages through it. The main results of this thesis are two new interconnection networks, the pruned butterfly and the sorting fat-tree and deterministic routing algorithms for them. Both networks are area-universal, i.e., they can simulate any other routing network fitting in similar VLSI chip area with only polylogarithmic slowdown. Previous area-universal networks were either for the off-line problem, where the message set to be routed is known in advance and substantial precomputation is permitted, or involved randomization, yielding results that hold only with high probability. The two networks introduced here are the first that are simultaneously deterministic and on-line and they use two substantially different routing techniques. The performance of the routing algorithms depends on the difficulty of the problem instance, which is measured by a quantity $\lambda$ known as the load factor. The pruned butterfly algorithm runs in time $O$ ($\lambda log^2$ $N$), where $N$ is the number of possible sources and destinations for messages and $\lambda$ is assumed to be polynomial in $N$. The sorting fat-tree algorithm runs in $O$ ($\lambda log N + log^2 N$) time for a restricted class of message sets including partial permutations. Several related results are also presented in this thesis. A nontrivial lower bound on wire area is proven for a class of tree-based networks that do not modify the content of messages are shown to be subject to an area-time tradeoff. This lower bound implies the sorting fat-tree's area-time performance is optimal for a wide range of possible values for $\lambda$. Other results of this work include a new type of sorting circuit and an area-universal VLSI circuit.