EFFICIENT HARDWARE FOR MULTI-WAY JUMPS AND PREFeTCHES Kevin Karplus and Alexandru Nicolau Department of Computer Science Cornell University Ithaca, NY 14853 Introduction Two recent trends in computer architecture have been increasing the size and complexity of microprograms: RISC machines, array processors, and VLIW machines are programmed directly in microcode, and CISC machines have large microcode programs that interpret higher-level machine instructions. The difficulty of developing and maintaining large microprograms suggests that they should be written in a high-level language and compiled by optimizing compilers. Conventional optimizing compilers have not been particularly effective for microcode compiling, because they optimize primarily within basic blocks (that is, segments of sequential code, uninterrupted by conditional jumps or jump targets), which are too small (3-5 instructions) to provide much code rearrangement. Hand coding, though slow and error-prone, has offered significant performance advantages over compiled microcode. Recent advances in optimization techniques-notably, trace scheduling [Fisher811 and percolation scheduling [Nicolau84]offer code rearrangement that crosses basic block boundaries. These code rearrangement techniques tend to cluster conditional jumps. Since conditional jumps make up 15-33% of the initial microcode, combining the conditional jumps of a cluster into a single multi-way jump offers substantial improvements in speed. Various schemes have
/lp/association-for-computing-machinery/efficient-hardware-for-multiway-jumps-and-pre-fetches-S5D0OooiSb