“Woah! It's like Spotify but for academic articles.”

Instant Access to Thousands of Journals for just $40/month

Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor IBM 3090 VF/600J

Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor IBM 3090 VF/600J PARALLEL BLOCK M A T R I X FACTORIZATIONS ON THE SHARED-MEMORY MULTIPROCESSOR IBM 3090 VF/GOOJ Krister Dackland, Erik Elmroth, and Bo K6gstrom INSTITUTE OF INFORMATION PROCESSING UNIVERSITY OF U M a S-90187U M a, SWEDEN Charles Van Loan DEPARTMENT OF COMPUTER SCIENCE CORNELL UNIVERSITY ITHACA, NEW YORK 14853-7501 Summary Efficient parallel block algorithms for the LU factorization with partial pivoting, the Cholesky factorization, and the QR factorization transportable over a range of parallel MlMD architectures are presented. Parallel implementations of different block algorithms that utilize optimized uniprocessor level-3 BIAS are compared with corresponding routines of IAPACK (under development). Parallelism is mainly invoked implicitly in UPACK by replacing calls t o uniprocessor level-3 kernels by calls to parallel level9 kernels and thereby maintaining portability. However, by parallelizing at the block level (explicitly) it is possible t o overlap and Pipeline different matrix-matrix operations and thereby gain some performance. Theoretical models give upPer bounds on the best possible speedup of the explicitly and implicitly parallel block algorithms for the target machine. The International Journal of SupercomputerAppkah-7 Volume 6, No. 1, Spring 1992, pp. 6 M 7 . 1932 Massachusetts Institute of Technology. Introduction \Vith the introduction of advanced parallel http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of High Performance Computing Applications SAGE

Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor IBM 3090 VF/600J

Abstract

PARALLEL BLOCK M A T R I X FACTORIZATIONS ON THE SHARED-MEMORY MULTIPROCESSOR IBM 3090 VF/GOOJ Krister Dackland, Erik Elmroth, and Bo K6gstrom INSTITUTE OF INFORMATION PROCESSING UNIVERSITY OF U M a S-90187U M a, SWEDEN Charles Van Loan DEPARTMENT OF COMPUTER SCIENCE CORNELL UNIVERSITY ITHACA, NEW YORK 14853-7501 Summary Efficient parallel block algorithms for the LU factorization with partial pivoting, the Cholesky factorization, and the QR factorization transportable over a range of parallel MlMD architectures are presented. Parallel implementations of different block algorithms that utilize optimized uniprocessor level-3 BIAS are compared with corresponding routines of IAPACK (under development). Parallelism is mainly invoked implicitly in UPACK by replacing calls t o uniprocessor level-3 kernels by calls to parallel level9 kernels and thereby maintaining portability. However, by parallelizing at the block level (explicitly) it is possible t o overlap and Pipeline different matrix-matrix operations and thereby gain some performance. Theoretical models give upPer bounds on the best possible speedup of the explicitly and implicitly parallel block algorithms for the target machine. The International Journal of SupercomputerAppkah-7 Volume 6, No. 1, Spring 1992, pp. 6 M 7 . 1932 Massachusetts Institute of Technology. Introduction \Vith the introduction of advanced parallel
Loading next page...
 
/lp/sage/parallel-block-matrix-factorizations-on-the-shared-memory-o7GJikDwpV

Sorry, we don't have permission to share this article on DeepDyve,
but here are related articles that you can start reading right now: