Message‐passing performance of various computersDongarra, Jack J.; Dunigan, Tom
doi: 10.1002/(SICI)1096-9128(199710)9:10<915::AID-CPE277>3.0.CO;2-Cpmid: N/A
This report compares the performance of different computer systems for basic message passing. Latency and bandwidth are measured on Convex, Cray, IBM, Intel, KSR, Meiko, nCUBE, NEC, SGI and TMC multiprocessors. Communication performance is contrasted with the computational power of each system. The comparison includes both shared and distributed memory computers as well as networked workstation clusters. © 1997 John Wiley & Sons, Ltd.
A performance debugging tool for high performance Fortran programsSuzuoka, Takashi; Subhlok, Jaspal; Gross, Thomas
doi: 10.1002/(SICI)1096-9128(199710)9:10<927::AID-CPE278>3.0.CO;2-2pmid: N/A
Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping.
Parallel implementation of a ray tracing algorithm for distributed memory parallel computersLee, Tong‐Yee; Raghavendra, C. S.; Nicholas, John B.
doi: 10.1002/(SICI)1096-9128(199710)9:10<947::AID-CPE279>3.0.CO;2-Ypmid: N/A
Ray tracing is a well known technique to generate life‐like images. Unfortunately, ray tracing complex scenes can require large amounts of CPU time and memory storage. Distributed memory parallel computers with large memory capacities and high processing speeds are ideal candidates to perform ray tracing. However, the computational cost of rendering pixels and patterns of data access cannot be predicted until runtime. To parallelize such an application efficiently on distributed memory parallel computers, the issues of database distribution, dynamic data management and dynamic load balancing must be addressed. In this paper, we present a parallel implementation of a ray tracing algorithm on the Intel Delta parallel computer. In our database distribution, a small fraction of database is duplicated on each processor, while the remaining part is evenly distributed among groups of processors. In the system, there are multiple copies of the entire database in the memory of groups of processors. Dynamic data management is acheived by an ALRU cache scheme which can exploit image coherence to reduce data movements in ray tracing consecutive pixels. We balance load among processors by distributing subimages to processors in a global fashion based on previous workload requests. The success of our implementation depends crucially on a number of parameters which are experimentally evaluated. © 1997 John Wiley & Sons, Ltd.
Linear array for spelling correctionFidanova, Stefka
doi: 10.1002/(SICI)1096-9128(199710)9:10<967::AID-CPE280>3.0.CO;2-Lpmid: N/A
This paper introduces a linear array for spelling correction using 15 processors. Many architectures have been proposed to solve similar string correction problems such as speech recognition or nucleic acid sequence computation. It is known that the hypercube, de Bruijn and grid networks contain a Hamiltonian path, a path which contains all the vertices of the network. The execution time of spelling correction on all of these networks is equal. © 1997 John Wiley & Sons, Ltd.
Parallel computation for natural convectionWang, P.; Ferraro, R. D.
doi: 10.1002/(SICI)1096-9128(199710)9:10<975::AID-CPE281>3.0.CO;2-Npmid: N/A
Parallel computation for two‐dimensional convective flows in cavities with adiabatic horizontal boundaries and driven by differential heating of the two vertical end walls are investigated using the Intel Paragon, Intel Touchstone Delta, Cray T3D and IBM SP2. The numerical scheme, including a parallel multigrid solver, and domain decomposition techniques for parallel computing are discussed in detail. Performance comparisons are made for the different parallel systems, and numerical results using various numbers of processors are discussed. © 1997 John Wiley & Sons, Ltd.