Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Efficient householder QR factorization for superscalar processors

Efficient householder QR factorization for superscalar processors To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Mathematical Software (TOMS) Association for Computing Machinery

Efficient householder QR factorization for superscalar processors

Loading next page...
 
/lp/association-for-computing-machinery/efficient-householder-qr-factorization-for-superscalar-processors-cLBBPWNAvH

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 1997 by ACM Inc.
ISSN
0098-3500
DOI
10.1145/275323.275326
Publisher site
See Article on Publisher Site

Abstract

To extract the potential promised by superscalar processors, algorithm designers must streamline memory references and allow for efficient data reuse throughout the memory hierarchy. Two parameterized Householder QR factorization algorithms are presented that take into account the caches and registers typical of such processors. Guidelines are developed for choosing parameter values that obtain near-optimal cache and register utilization. The new algorithms are implemented and performance-tuned on an Intel Pentium Pro system, a single thin POWER2 node of the IBM Scalable Parallel system 2 (SP2), and a single R8000 processor of a Silicon Graphs POWER Challenge XL.

Journal

ACM Transactions on Mathematical Software (TOMS)Association for Computing Machinery

Published: Sep 1, 1997

There are no references for this article.