Limits of Task-based Parallelism in Irregular Applications Barbara Kreaseck Dean Tullsen Brad Calder Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 {kreaseck, tullsen, calder} @es.uesd.edu Extended Abstract Tomorrow's microprocessors will be able m handle multiple flows of control. Applications that exhibit task level parallelism(TLP) and can be decomposed into parallel tasks will perform well on these platforms. TLP arises when a task is independent of its neighboring code. TraditionO parallel compilers exploit one variety of TLP, loop level parallelism (LLP), where loop iterations are executed in parallel. LLP can overwhelming be found in numeric, typically FORTRAN programs with regular patterns of data accesses. In conlrast, irregular applications, typified by general purpose integer applications, exhibit little LLP as they tend to access data in irregular patterns through pointers. Without pointer disambiguation to analyze data access dependences, u'aditional parallel compilers cannot parallelize these irregular applications and ensure correct execution. We focus on a different variety of TIP, namely Speculative Task Parallelism(STP). STP arises when a task (either a leaf-procedure, a non-leaf procedure or an entire loop) is control- and memory-independent of its preceding code, and thus could be executed in parallel. Two sections of ⢠code are memory-independent when neither contains a store to a memory location that the other accesses. To exploit STP, we assume a hypothetical speculative machine that supports speculative futures (a parallel programming construct that executes a task early on a different thread or processor) with mechanisms for resolving incorrect speculation when the task is not, after all, independent. This allows us to speculatively parallefize code when there is a high probability of independence, but no guarantee. Figure 1 illustrates STP, showing a task Y in the dynamic instruction stream of an irregular application that has no memory access conflicts with a group of instructions, X, that precede 3(. The shorter of X and Y determines the overlap of memory-independent instructions as seen in Figures l(b) and t(c). In the absence of any register dependences, X and Y may be executed in parallel, resulting in shorter execution time. It is hard for traditional parallel compilers of pointer-based languages to expose this parallelism. I..-I D (s| M, Fq Fq {b) (c| Figure 1" STP example: (a) shows a section o f the dynamic instruction stream where the task Y is known to be memoryindependent of the preceding code X. (b) the shaded region shows memory- and control-independent instructions that are essentially removed from the critical path when Y is executed in parallel with X. (c) when task Y is longer than X. The goals of this paper are to identify such regions as X and Y within irregular applications and to find the number o f instructions that may thus be removed from the critical path. This number represents the maximum STP when the cost of exploiting STP is zero. Because the biggest barrier to detecting independence in irregular codes is memory disambiguation, we identify memory-independent tasks using a profile-based approach and measure the amount of STP by estimating the amount of memory-independent instructions those tasks expose. We vary the level of conll"ol dependence and memory dependence to investigate their effect on the amount of memoryindependence we find. We profile at different memory granularities and introduce synchronization to expose higher levels of memory-independence. Across this variety of speculation assumptions, 7 to 22% of dynamic instructions are within tasks that are found to be memory-independent. This was on the SPECint95 benchmarks, a set of irregular applications for which traditional methods of parallelization are ineffective. - -
/lp/association-for-computing-machinery/limits-of-task-based-parallelism-in-irregular-applications-AC8xfL3iyK