An Abstract Interface for System Software on Large-Scale Clusters

Juan Fernández; Eitan Frachtenberg; Fabrizio Petrini; José-Carlos Sancho

doi:10.1093/comjnl/bxl020

Loading next page...

References (45)

F. Petrini, Wu-chun Feng (2001)
IMPROVED RESOURCE UTILIZATION WITH BUFFERED COSCHEDULING
Parallel Algorithms and Applications, 16
T. Eicken, D. Culler, S. Goldstein, K. Schauser (1992)
Active Messages: A Mechanism for Integrated Communication and Computation
[1992] Proceedings the 19th Annual International Symposium on Computer Architecture
L. Lamport (1979)
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers, C-28
(1993)
Cray T3D. System Architecture Overview
R. Gioiosa, J. Sancho, Song Jiang, F. Petrini (2005)
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers
ACM/IEEE SC 2005 Conference (SC'05)
F. Petrini, D. Kerbyson, S. Pakin (2003)
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q
ACM/IEEE SC 2003 Conference (SC'03)
R. Brightwell, L. Fisk (2001)
Scalable Parallel Application Launch on Cplant ™
ACM/IEEE SC 2001 Conference (SC'01)
Charles Leiserson, Z. Abuhamdeh, David Douglas, C. Feynman, Mahesh Ganmukhi, Jeffrey Hill, W. Hillis, Bradley Kuszmaul, Margaret Pierre, D. Wells, Monica Wong-Chan, Shaw-Wen Yang, R. Zak (1996)
The Network Architecture of the Connection Machine CM-5
J. Parallel Distributed Comput., 33
(1999)
Elite Reference Manual
E. Frachtenberg, F. Petrini, Juan Peinador, S. Pakin, S. Coll (2002)
STORM: Lightning-Fast Resource Management
ACM/IEEE SC 2002 Conference (SC'02)
M. Snir, S. Otto, D. Walker, J. Dongarra, S. Huss-Lederman (1996)
MPI: The Complete Reference
D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, T. Eicken (1993)
LogP: towards a realistic model of parallel computation
D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, M. Gittings (2001)
Predictive Performance and Scalability Modeling of a Large-Scale Application
ACM/IEEE SC 2001 Conference (SC'01)
F. Petrini, Wu-chun Feng, A. Hoisie, S. Coll, E. Frachtenberg (2002)
The Quadrics Network: High-Performance Clustering Technology
IEEE Micro, 22
G. Almási, Ralph Bellofatto, J. Brunheroto, Calin Cascaval, J. Castaños, P. Crumley, C. Erway, D. Lieber, X. Martorell, José Moreira, R. Sahoo, A. Sanomiya, L. Ceze, K. Strauss (2003)
An Overview of the Blue Gene/L System Software Organization
Parallel Process. Lett., 13
김성운, 모상만, 권혁제, 김보관 (2001)
InfiniBand 물리 계층 설계
(1999)
Cplant . login : USENIX Magazine
(1992)
NI System Programming
(2002)
How does ASCI actually complete multimonth 1000-processor milestone simulations?
(1999)
Quadrics Supercomputers World Ltd
Jiuxing Liu, A. Mamidala, Abhinav Vishnu, D. Panda (2005)
Evaluating InfiniBand performance with PCI Express
IEEE Micro, 25
Yang-Suk Kee, S. Ha (2002)
An Efficient Implementation of the BSP Programming Library for VIA
Parallel Process. Lett., 12
J. Fernandez, E. Frachtenberg, F. Petrini (2003)
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
ACM/IEEE SC 2003 Conference (SC'03)
D. Culler, J. Singh, Anoop Gupta (1998)
Parallel computer architecture - a hardware / software approach
J. Sancho, F. Petrini, Greg Johnson, Juan Peinador, E. Frachtenberg (2004)
On the feasibility of incremental checkpointing for scientific computing
18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
G. Bosilca, Aurélien Bouteiller, F. Cappello, Samir Djilali, G. Fedak, C. Germain, T. Hérault, Pierre Lemarinier, O. Lodygensky, F. Magniette, V. Néri, A. Selikhov (2002)
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes
ACM/IEEE SC 2002 Conference (SC'02)
L. Valiant (1990)
A bridging model for parallel computation
Commun. ACM, 33
Jonathan Hill, W. Mccoll, D. Stefanescu, M. Goudreau, Kevin Lang, Satish Rao, Torsten Suel, T. Tsantilas, R. Bisseling (1998)
BSPlib: The BSP programming library
Parallel Comput., 24
E. Frachtenberg, F. Petrini, S. Coll, Wu-chun Feng (2001)
Gang scheduling with lightweight user-level communication
Proceedings International Conference on Parallel Processing Workshops
D. Feitelson, L. Rudolph (1992)
Gang Scheduling Performance Benefits for Fine-Grain Synchronization
J. Parallel Distributed Comput., 16
Tomio Kamada, S. Matsuoka, A. Yonezawa (1994)
Efficient parallel global garbage collection on massively parallel computers
Proceedings of Supercomputing '94
K. Davis, A. Hoisie, Greg Johnson, D. Kerbyson, M. Lang, S. Pakin, F. Petrini (2004)
A Performance and Scalability Analysis of the BlueGene/L Architecture
Proceedings of the ACM/IEEE SC2004 Conference
E. Hendriks (2002)
BProc: the Beowulf distributed process space
A. Hori, H. Tezuka, Y. Ishikawa (1998)
Overhead Analysis of Preemptive Gang Scheduling
N. Adiga, G. Almási, G. Almási, Y. Aridor, R. Barik, D. Beece, Ralph Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, Calin Cascaval, J. Castaños, W. Chan, L. Ceze, P. Coteus, S. Chatterjee, Dong Chen, G. Chiu, T. Cipolla, P. Crumley, K. Desai, A. Deutsch, T. Domany, M. Dombrowa, W. Donath, M. Eleftheriou, C. Erway, J. Esch, B. Fitch, J. Gagliano, A. Gara, R. Garg, R. Germain, M. Giampapa, B. Gopalsamy, John Gunnels, Manish Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, R. Jackson, T. Jamal-Eddine, G. Kopcsay, E. Krevat, M. Kurhekar, A. Lanzetta, D. Lieber, L. Liu, M. Lu, M. Mendell, A. Misra, Y. Moatti, L. Mok, J. Moreira, B. Nathanson, M. Newton, M. Ohmacht, A. Oliner, Vinayaka Pandit, R. Pudota, R. Rand, R. Regan, B. Rubin, A. Ruehli, S. Rus, R. Sahoo, A. Sanomiya, E. Schenfeld, M. Sharma, Edi Shmueli, Sarabjeet Singh, Peilin Song, V. Srinivasan, B. Steinmacher-Burow, K. Strauss, C. Surovic, R. Swetz, T. Takken, R. Tremaine, M. Tsao, A. Umamaheshwaran, P. Verma, P. Vranas, T. Ward, M. Wazlowski, W. Barrett, C. Engel, B. Drehmel, B. Hilgart, D. Hill, F. Kasemkhani, D. Krolak, Chun-Tao Li, T. Liebsch, J. Marcella, A. Muff, A. Okomo, M. Rouse, A. Schram, M. Tubbs, G. Ulsh, Charles Wait, J. Wittrup, M. Bae, Kenneth Dockser, L. Kissel, M. Seager, J. Vetter, K. Yates (2002)
An Overview of the BlueGene/L Supercomputer
ACM/IEEE SC 2002 Conference (SC'02)
S. Scott (1996)
Synchronization and communication in the T3E multiprocessor
S. Fortune, J. Wyllie (1978)
Parallelism in random access machines
Proceedings of the tenth annual ACM symposium on Theory of computing
V. Sunderam (1990)
PVM: A Framework for Parallel Distributed Computing
Concurr. Pract. Exp., 2
(1992)
Solution of the first-order form of the 3-D discrete ordinates equation on a massively parallel processor
R. Bhoedjang, Tim Rühl, H. Bal (1998)
Efficient multicast on Myrinet using link-level flow control
Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205)
David Petrou, Steven Rodrigues, Amin Vahdat, T. Anderson (1998)
GLUix: a global layer unix for a network of workstations
Software: Practice and Experience, 28
F. Petrini, Juan Peinador, E. Frachtenberg, S. Coll (2003)
Scalable collective communication on the ASCI Q machine
11th Symposium on High Performance Interconnects, 2003. Proceedings.
Jiuxing Liu, A. Mamidala, D. Panda (2004)
Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support
18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
H. Franke, P. Pattnaik, L. Rudolph (1996)
Gang scheduling for highly efficient, distributed multiprocessor systems
Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96)
Weikuan Yu, Darius Buntinas, R. Graham, D. Panda (2004)
Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol
18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.

Publisher: Oxford University Press
ISSN: 0010-4620
eISSN: 1460-2067
DOI: 10.1093/comjnl/bxl020
Publisher site: See Article on Publisher Site

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

An Abstract Interface for System Software on Large-Scale Clusters

An Abstract Interface for System Software on Large-Scale Clusters

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

An Abstract Interface for System Software on Large-Scale Clusters

An Abstract Interface for System Software on Large-Scale Clusters

References (45)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies