An Integrated Hardware-Software Approach to
Flexible Transactional Memory
Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J. Marathe,
Sandhya Dwarkadas, and Michael L. Scott
Department of Computer Science, University of Rochester
There has been considerable recent interest in both hardware and
software transactional memory (TM). We present an intermediate
approach, in which hardware serves to accelerate a TM implemen-
tation controlled fundamentally by software. Speciﬁcally, we de-
scribe an alert on update mechanism (AOU) that allows a thread to
receive fast, asynchronous notiﬁcation when previously-identiﬁed
lines are written by other threads, and a programmable data isola-
tion mechanism (PDI) that allows a thread to hide its speculative
writes from other threads, ignoring conﬂicts, until software decides
to make them visible. These mechanisms reduce bookkeeping, val-
idation, and copying overheads without constraining software pol-
icy on a host of design decisions.
We have used AOU and PDI to implement a hardware-
accelerated software transactional memory system we call RTM.
We have also used AOU alone to create a simpler “RTM-Lite”.
Across a range of microbenchmarks, RTM outperforms RSTM, a
publicly available software transactional memory system, by as
much as 8.7× (geometric mean of 3.5×) in single-thread mode.
At 16 threads, it outperforms RSTM by as much as 5×, with an
average speedup of 2×. Performance degrades gracefully when
transactions overﬂow hardware structures. RTM-Lite is slightly
faster than RTM for transactions that modify only small objects;
full RTM is signiﬁcantly faster when objects are large. In a strong
argument for policy ﬂexibility, we ﬁnd that the choice between ea-
ger (ﬁrst-access) and lazy (commit-time) conﬂict detection can lead
to signiﬁcant performance differences in both directions, depending
on application characteristics.
Categories and Subject Descriptors: B.3.2 [Memory Struc-
tures]: Design Styles—Shared memory D.1.3 [Programming
Techniques]: Concurrent Programming—Parallel programming
C.1.2 [Processor Architectures]: Multiprocessors
General Terms: Performance, Design, Languages
Keywords: Transactional memory, Cache coherence, Multiproces-
This work was supported in part by NSF grants CCR-0204344, CNS-
0411127, CNS-0615139, and CNS-0509270; an IBM Faculty Partnership
Award; equipment support from Sun Microsystems Laboratories; and ﬁ-
nancial support from Intel and Microsoft.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
ISCA’07, June 9–13, 2007, San Diego, California, USA.
Copyright 2007 ACM 978-1-59593-706-3/07/0006 ...
1. INTRODUCTION AND BACKGROUND
Transactional memory (TM) has emerged as a promising al-
ternative to lock-based synchronization. TM systems seek to in-
crease scalability, reduce programming complexity, and overcome
the semantic problems of deadlock, priority inversion, and non-
composability associated with locks. Originally proposed by Her-
lihy and Moss , TM borrows the notions of atomicity, consis-
tency, and isolation from database transactions. In a nutshell, the
programmer or compiler labels sections of code as atomic and
relies on the underlying system to ensure that their execution is
serializable and as highly concurrent as possible. Several hard-
ware [1, 3, 7,14, 16, 18–20] and software [5, 8, 12, 22, 24] TMs have
been proposed. Hardware has the advantage of speed, but embeds
signiﬁcant policy in silicon. Software can run on stock processors
and preserves policy ﬂexibility, but incurs signiﬁcant overhead to
track data versions, detect conﬂicts between transactions, and guar-
antee a consistent view of memory.
We propose that hardware serve simply to optimize the perfor-
mance of transactions that are controlled fundamentally by soft-
ware. We present a system, RTM, that embodies this philosophy.
The RTM software (currently based on a modiﬁed version of the
RSTM software TM ) retains policy ﬂexibility, and implements
transactions unbounded in space and in time.
The RTM hardware consists of 1) an alert-on-update mecha-
nism (AOU) for fast software-controlled conﬂict detection; and 2)
programmable data isolation (PDI), which allows potentially con-
ﬂicting readers and writers to proceed concurrently under software
control. AOU is the simpler and more general of the mechanisms. It
can be used for almost any task that beneﬁts from ﬁne-grain access
control. In RTM, it serves to capture transaction conﬂicts and guar-
antee memory consistency without the heavy cost of continually
validating objects that were previously read . PDI additionally
eliminates the cost of data copying or logging in bounded transac-
tions. In our experiments we evaluate both full RTM (RTM-F) and
an “RTM-Lite” that uses only AOU.
Damron et al.  describe a design philosophy for a hybrid TM
system in which hardware makes a “best effort” attempt to complete
transactions, falling back to software when necessary. The goal is
to leverage almost any reasonable hardware implementation. Ku-
mar et al.  describe a speciﬁc hardware–software hybrid that
builds on the software system of Herlihy et al. . Unfortunately,
this system still embeds signiﬁcant policy in silicon. It assumes, for
example, that conﬂicts are detected as early as possible, disallow-
ing either read-write or write-write sharing. Scherer et al. [12, 23]
report performance differences across applications of 2×–10× in
each direction for this design decision, and for contention manage-
ment and metadata organization.