Concurrency and Computation: Practice & Experience

Concurrency and Computation: Practice & Experience | DeepDyve

journal article

LitStream Collection

Masthead

1996 Concurrency and Computation: Practice & Experience

doi: 10.1002/cpe.4330080701pmid: N/A

journal article

LitStream Collection

Effective data parallel computation using the Psi calculus

Mullin, L.M.R.; Jenkins, M.A.

1996 Concurrency and Computation: Practice & Experience

doi: 10.1002/(SICI)1096-9128(199609)8:7<499::AID-CPE230>3.0.CO;2-1pmid: N/A

Large scale scientific computing necessitates finding a way to match the high level understanding of how a problem can be solved with the details of its computation in a processing environment organized as networks of processors. Effective utilization of parallel architectures can then be achieved by using formal methods to describe both computations and computational organizations within these networks. By returning to the mathematical treatment of a problem as a high level numerical algorithm we can express it as an algorithmic formalism that captures the inherent parallelism of the computation. We then give a meta description of an architecture followed by the use of transformational techniques to convert the high level description into a program that utilizes the architecture effectively. The hope is that one formalism can be used to describe both computations as well as architectures and that a methodology for automatically transforming computations can be developed. The formalism and methodology presented in the paper is a first step toward the ambitious goals described above. It uses a theory of arrays, the Psi calculus, as the formalism, and two levels of conversions—one for simplification and another for data mapping.

journal article

LitStream Collection

PB‐BLAS: a set of parallel block basic linear algebra subprograms

Choi, Jaeyoung; Dongarra, Jack J.; Walker, David W.

1996 Concurrency and Computation

doi: 10.1002/(SICI)1096-9128(199609)8:7<517::AID-CPE226>3.0.CO;2-Wpmid: N/A

We propose a new software package which would be very useful for implementing dense linear algebra algorithms on block‐partitioned matrices. The routines are referred to as block basic linear algebra subprograms (BLAS), and their use is restricted to computations in which one or more of the matrices involved consists of a single row or column of blocks, and in which no more than one of the matrices consists of an unrestricted two‐dimensional array of blocks. The functionality of the block BLAS routines can also be provided by Level 2 and 3 BLAS routines. However, for non‐uniform memory access machines the use of the block BLAS permits certain optimizations in memory access to be taken advantage of. This is particularly true for distributed memory machines, for which the block BLAS are referred to as the parallel block basic linear algebra subprograms (PB‐BLAS). The PB‐BLAS are the main focus of this paper, and for a block‐cyclic data distribution, in a single row or column of blocks lies in a single row or column of the processor template.

journal article

LitStream Collection

SPEED: A parallel platform for solving and predicting the performance of PDEs on distributed systems

Hui, Chi‐Chung; Hamdi, Mounir; Ahmad, Ishfaq

1996 Concurrency and Computation: Practice & Experience

doi: 10.1002/(SICI)1096-9128(199609)8:7<537::AID-CPE225>3.0.CO;2-Xpmid: N/A

Distributed systems such as networks of workstations are becoming an increasingly viable alternative to traditional supercomputer systems for running complex scientific applications. A large number of these applications require solving sets of partial differential equations (PDEs). In this paper, we describe the implementation and performance of SPEED (Scalable Partial differential Equation Environment on Distributed systems), a parallel platform which provides an efficient solution for time‐dependent PDEs. SPEED allows the inclusion of a wide range of parameters and programming aids. PVM is employed as the underlying message‐passing system. The parallel implementation has been performed using two algorithms. The first algorithm is a two‐phase scheme which uses the conventional technique of alternating phases of computation and communication. The second algorithm employs a pre‐computation technique that allows overlapping of computation and communication. Both methods yield significant speedups. The pre‐computation technique reduces the communication time between the workstations but incurs additional overhead in buffer management. Hence, if the saving in communication time is larger than the overhead, the pre‐computation technique outperforms the two‐phase algorithm. SPEED also provides a performance prediction methodology that can accurately predict the performance of a given application on the system before running the application. This methodology allows the user to tune various parameters in order to identify system bottlenecks and maximize the performance.

Showing 1 to 4 of 4 Articles

Articles per page

Concurrency and Computation: Practice & Experience

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

Related Journals: