Code Scheduling for VLIW/Superscalar with Limited Register Files Processors Tokuzo Kiyohara Media Research Laboratory Matsushita Electric Industrial Co., Ltd. Kadoma-shi, Osaka, 571 Japan John C. Gyllenhaal Coordinated Science Laboratory University of Illinois, Urbana-Champaign Urbana, IL 61801 Abstract Moderate size register files can limit the perfor- a) original *q e CB c) Breadth-lint schedule mance of loop unrolling on multiple issue processors. With current scheduling heuristics, a breadth-first scheduling of iterations occurs, increasing register pressure and generating excessive spill code. A heuristic is proposed that causes a more depthfirst scheduling of unrolled iterations. This heuristic reduces the overlapping of the unrolled iterations and as a result, reduces register pressure. The experimental evaluation shows increased performance on processors with 32 or 64 registers. In addition, the performance of dependency removing optimizations is stabilized, so that applying additional optimizations is more likely to increase performance. (15 cycles) (16ww Figure 1: Breadth-first an unrolled loop and depth-first scheduling of Introduction In multiple instruction issue processors, such as quence extracts more performance than postscheduling alone. Goodman and Hsu [3] showed that a prepass scheduler can avoid introducing excessive spill code by switching between two scheduling algorithms when the number of available registers passes
/lp/association-for-computing-machinery/code-scheduling-for-vliw-superscalar-processors-with-limited-register-1w6l0doC7E