A VLIW ARCHITECTURE FOR OPTIMAL EXECUTION OF BRANCH-INTENSIVE LOOPS Bogong Suf, Wei Zhao$, and Zhizhong Tang Lkpt.dc! mpwrscimx T~inglnm Univmity Beijii loo084 Chim Stanley Habib CcmpucrsckmcDcp. Tbz City univcnity EMnil: ThcGradmleSdxoldUdwnityChtu of New York, 33 W. 42 St., New Yak, kLu.l~UNYvM.cUNY.Ew FAX: 212-642-1902 NY lo036 Tel: 212-642-2201 ABSTRACT We propose a VLIW architectural model for optimal execution of branch-intensive loops as well as a new single-chip architecture URPR-2 for digital signal and image processing based on this model. In this architecture the instructions belonging to different iterations and different paths can be executed simultaneously. Instruction level parallelism can be exploited in a wider scope as multi-branching can be processed in each machine cycle. A mechanism called the pipeline control blackboard(PCBB) is also proposed to support conditional branches. The URPR-2 can not only execute loops with basic block at high speed but also can run loops with conditional branches at a cost of reduced time and space occupancy. Many approaches have been developed to solve the branch problem at compile-time. They may obtain very good results for those loops with sparse branches, sometimes their parallelism can reach 90 operations per cycle[Nicolau84]. But they are poor for the branchintensive loops,
/lp/association-for-computing-machinery/a-vliw-architecture-for-optimal-execution-of-branch-intensive-loops-CimAmglrWx