Access the full text.
Sign up today, get DeepDyve free for 14 days.
D. Brooks, P. Bose, S. Schuster, H. Jacobson, P. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban, Manish Gupta, P. Cook (2000)
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation MicroprocessorsIEEE Micro, 20
P. Shivakumar, N. Jouppi (2001)
Cacti 3. 0: an integrated cache timing, power, and area model
Rakesh Kumar, K. Farkas, N. Jouppi, Parthasarathy Ranganathan, D. Tullsen (2003)
Single-ISA heterogeneous multi-core architectures: the potential for processor power reductionProceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36.
Kai Wang, M. Franklin (1997)
Highly accurate data value prediction using hybrid predictorsProceedings of 30th Annual International Symposium on Microarchitecture
T. Sherwood, S. Sair, B. Calder (2000)
Predictor-directed stream buffersProceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000
Tse-Yu Yeh, Y. Patt (1992)
A Comprehensive Instruction Fetch Mechanism For A Processor Supporting Speculative Execution[1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25
Srikanth Srinivasan, R. Ju, A. Lebeck, C. Wilkerson (2001)
Locality vs. criticalityProceedings 28th Annual International Symposium on Computer Architecture
Glenn Reinman, T. Austin, B. Calder (1999)
A scalable front-end architecture for fast instruction deliveryProceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367)
P. Kundu, M. Annavaram, T. Diep, J. Shen (2004)
A case for shared instruction cache on chip multiprocessors running OLTPSIGARCH Comput. Archit. News, 32
D. Burger, T. Austin (1997)
The SimpleScalar tool set, version 2.0SIGARCH Comput. Archit. News, 25
J. Stark, Paul Racunas, Y. Patt (1997)
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-orderProceedings of 30th Annual International Symposium on Microarchitecture
(2005)
ACM SIGARCH Computer Architecture News
A. Snavely, D. Tullsen (2000)
Symbiotic jobscheduling for a simultaneous mutlithreading processor
D. Kroft (1998)
Retrospective: lockup-free instruction fetch/prefetch cache organization
R. Dolbeau, André Seznec (2004)
CASH: Revisiting Hardware Sharing in Single-Chip Parallel ProcessorsJ. Instr. Level Parallelism, 6
Rakesh Kumar, N. Jouppi, D. Tullsen (2004)
Conjoined-Core Chip Multiprocessing37th International Symposium on Microarchitecture (MICRO-37'04)
Ho-Seop Kim, James Smith (2002)
An instruction set and microarchitecture for instruction level distributed processingProceedings 29th Annual International Symposium on Computer Architecture
Eric Sprangle, Douglas Carmean (2002)
Increasing processor performance by implementing deeper pipelinesProceedings 29th Annual International Symposium on Computer Architecture
Eric Borch, Eric Tune, Srilatha Manne, J. Emer (2002)
Loose loops sink chipsProceedings Eighth International Symposium on High Performance Computer Architecture
James Smith (2001)
Instruction-Level Distributed ProcessingComputer, 34
T. Sherwood, S. Sair, B. Calder (2003)
Phase tracking and prediction30th Annual International Symposium on Computer Architecture, 2003. Proceedings.
J. Kin, Munish Gupta, W. Mangione-Smith (1997)
The filter cache: an energy efficient memory structureProceedings of 30th Annual International Symposium on Microarchitecture
D. Tullsen, S. Eggers, H. Levy (1995)
Simultaneous multithreading: Maximizing on-chip parallelismProceedings 22nd Annual International Symposium on Computer Architecture
R. Balasubramonian, S. Dwarkadas, D. Albonesi (2001)
Reducing the complexity of the register file in dynamic superscalar processorsProceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34
Lance Hammond, B. Nayfeh, K. Olukotun (1997)
A Single-Chip MultiprocessorComputer, 30
Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware. One alternative is a small, simple core that can be augmented with latency tolerant helper engines. As the demands placed on the processor core varies between applications, and even between phases of an application, the benefit seen from any set of helper engines will vary tremendously. If there is a single core, these auxiliary structures can be turned on and off dynamically to tune the energy/performance of the machine to the needs of the running application.As more of the processor is broken down into helper engines, and as we add more and more cores onto a single chip which can potentially share helpers, the decisions that are made about these structures become increasingly important. In this paper we describe the need for methods that effectively manage these helper engines. Our counter-based approach can dynamically turn off 3 helpers on average, while staying within 2% of the performance when running with all helpers. In a multicore environment, our intelligent and flexible sharing of helper engines, provides an average 24% speedup over static sharing in conjoined cores. Furthermore we show benefit from constructively sharing helper engines among multiple cores running the same application.
ACM SIGARCH Computer Architecture News – Association for Computing Machinery
Published: Nov 1, 2005
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.