Dynamically configurable shared CMP helper engines for improved performance

Anahita Shayesteh; Glenn Reinman; Norman Jouppi; Suleyman Sair; Tim Sherwood

doi:10.1145/1105734.1105744

Loading next page...

References (25)

D. Brooks, P. Bose, S. Schuster, H. Jacobson, P. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban, Manish Gupta, P. Cook (2000)
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors
IEEE Micro, 20
P. Shivakumar, N. Jouppi (2001)
Cacti 3. 0: an integrated cache timing, power, and area model
Rakesh Kumar, K. Farkas, N. Jouppi, Parthasarathy Ranganathan, D. Tullsen (2003)
Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction
Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36.
Kai Wang, M. Franklin (1997)
Highly accurate data value prediction using hybrid predictors
Proceedings of 30th Annual International Symposium on Microarchitecture
T. Sherwood, S. Sair, B. Calder (2000)
Predictor-directed stream buffers
Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000
Tse-Yu Yeh, Y. Patt (1992)
A Comprehensive Instruction Fetch Mechanism For A Processor Supporting Speculative Execution
[1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25
Srikanth Srinivasan, R. Ju, A. Lebeck, C. Wilkerson (2001)
Locality vs. criticality
Proceedings 28th Annual International Symposium on Computer Architecture
Glenn Reinman, T. Austin, B. Calder (1999)
A scalable front-end architecture for fast instruction delivery
Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367)
P. Kundu, M. Annavaram, T. Diep, J. Shen (2004)
A case for shared instruction cache on chip multiprocessors running OLTP
SIGARCH Comput. Archit. News, 32
D. Burger, T. Austin (1997)
The SimpleScalar tool set, version 2.0
SIGARCH Comput. Archit. News, 25
J. Stark, Paul Racunas, Y. Patt (1997)
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order
Proceedings of 30th Annual International Symposium on Microarchitecture
(2005)
ACM SIGARCH Computer Architecture News
A. Snavely, D. Tullsen (2000)
Symbiotic jobscheduling for a simultaneous mutlithreading processor
D. Kroft (1998)
Retrospective: lockup-free instruction fetch/prefetch cache organization
R. Dolbeau, André Seznec (2004)
CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors
J. Instr. Level Parallelism, 6
Rakesh Kumar, N. Jouppi, D. Tullsen (2004)
Conjoined-Core Chip Multiprocessing
37th International Symposium on Microarchitecture (MICRO-37'04)
Ho-Seop Kim, James Smith (2002)
An instruction set and microarchitecture for instruction level distributed processing
Proceedings 29th Annual International Symposium on Computer Architecture
Eric Sprangle, Douglas Carmean (2002)
Increasing processor performance by implementing deeper pipelines
Proceedings 29th Annual International Symposium on Computer Architecture
Eric Borch, Eric Tune, Srilatha Manne, J. Emer (2002)
Loose loops sink chips
Proceedings Eighth International Symposium on High Performance Computer Architecture
James Smith (2001)
Instruction-Level Distributed Processing
Computer, 34
T. Sherwood, S. Sair, B. Calder (2003)
Phase tracking and prediction
30th Annual International Symposium on Computer Architecture, 2003. Proceedings.
J. Kin, Munish Gupta, W. Mangione-Smith (1997)
The filter cache: an energy efficient memory structure
Proceedings of 30th Annual International Symposium on Microarchitecture
D. Tullsen, S. Eggers, H. Levy (1995)
Simultaneous multithreading: Maximizing on-chip parallelism
Proceedings 22nd Annual International Symposium on Computer Architecture
R. Balasubramonian, S. Dwarkadas, D. Albonesi (2001)
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34
Lance Hammond, B. Nayfeh, K. Olukotun (1997)
A Single-Chip Multiprocessor
Computer, 30

Publisher: Association for Computing Machinery
Copyright: Copyright © 2005 by ACM Inc.
ISSN: 0163-5964
DOI: 10.1145/1105734.1105744
Publisher site: See Article on Publisher Site

Abstract

Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware. One alternative is a small, simple core that can be augmented with latency tolerant helper engines. As the demands placed on the processor core varies between applications, and even between phases of an application, the benefit seen from any set of helper engines will vary tremendously. If there is a single core, these auxiliary structures can be turned on and off dynamically to tune the energy/performance of the machine to the needs of the running application.As more of the processor is broken down into helper engines, and as we add more and more cores onto a single chip which can potentially share helpers, the decisions that are made about these structures become increasingly important. In this paper we describe the need for methods that effectively manage these helper engines. Our counter-based approach can dynamically turn off 3 helpers on average, while staying within 2% of the performance when running with all helpers. In a multicore environment, our intelligent and flexible sharing of helper engines, provides an average 24% speedup over static sharing in conjoined cores. Furthermore we show benefit from constructively sharing helper engines among multiple cores running the same application.

Journal

ACM SIGARCH Computer Architecture News – Association for Computing Machinery

Published: Nov 1, 2005

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Dynamically configurable shared CMP helper engines for improved performance

Dynamically configurable shared CMP helper engines for improved performance

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Dynamically configurable shared CMP helper engines for improved performance

Dynamically configurable shared CMP helper engines for improved performance

References (25)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies