Realizing High IPC Through a Scalable Memory-Latency Tolerant Multipath Microarchitecture D. Morano, A. Khalafi, D.R. Kaeli Northeastern University dmorano, akhalafi, kaeli@ece.neu.edu Abstract A.K. Uht University of Rhode Island uht@ele.uri.edu designs. Unfortunately, most of the fine-grained ILP inherent in integer sequential programs spans several A microarchitecture is described that achieves high basic blocks. Data and control independent instrucperformance on conventional single-threaded pro- tions, that may exist far ahead in the program ingram codes without compiler assistance. To obtain struction stream, need to be speculatively executed high instructions per clock (IPC) for inherently se- to exploit all possible inherent ILP. quential (e.g., SpecInt-2000 programs), a large numA large number of instructions need to be fetched ber of instructions must be in flight simultaneously. each cycle and executed concurrently in order to However, several problems are associated with such achieve this. We need to find the available program microarchitectures, including scalability, issues re- ILP at runtime and to provide sufficient hardware lated to control flow, and memory latency. to expose, schedule, and otherwise manage the outOur design investigates how to utilize a large mesh of-order speculative execution of control independent of processing elements in order to execute a single- instructions. Of
/lp/association-for-computing-machinery/realizing-high-ipc-through-a-scalable-memory-latency-tolerant-rvFIhD9PXX