Supporting high‐performance I/O in QoS‐enabled ORB middlewareKuhns, Fred; Levine, David; Schmidt, Douglas; O'Ryan, Carlos
doi: 10.1023/A:1019032220910pmid: N/A
To be an effective platform for high‐performance distributed applications, off-the-shelf Object Request Broker (ORB) middleware, such as CORBA, must preserve communication-layer quality of service (QoS) properties both vertically (i.e., network interface ↔ application layer) and horizontally (i.e., end-to-end). However, conventional network interfaces, I/O subsystems, and middleware interoperability protocols are not well-suited for applications that possess stringent throughput, latency, and jitter requirements. It is essential, therefore, to develop vertically and horizontally integrated ORB endsystems that can be (1) configured flexibly to support high-performance network interfaces and I/O subsystems and (2) used transparently by performance-sensitive applications. This paper provides three contributions to research on high-performance I/O support for QoS-enabled ORB middleware. First, we outline the key research challenges faced by high-performance ORB endsystem developers. Second, we describe how our real-time I/O (RIO) subsystem and pluggable protocol framework enables ORB endsystems to preserve high-performance network interface QoS up to applications running on off-the-shelf hardware and software. Third, we illustrate empirically how highly optimized ORB middleware can be integrated with real-time I/O subsystem to reduce latency bounds on communication between high-priority clients without unduly penalizing low-priority and best-effort clients. Our results demonstrate how it is possible to develop ORB endsystems that are both highly flexible and highly efficient.
LSMAC vs. LSNAT: Scalable cluster‐based Web serversGan, Xuehong; Schroeder, Trevor; Goddard, Steve; Ramamurthy, Byrav
doi: 10.1023/A:1019084304980pmid: N/A
Server scalability is more important than ever in today's client/server dominated network environments. Recently, researchers have begun to consider cluster-based computers using commodity hardware as an alternative to expensive specialized hardware for building scalable Web servers. In this paper, we present performance results comparing two cluster-based Web servers based on different server architectures: OSI layer two dispatching (LSMAC) and OSI layer three dispatching (LSNAT). Both cluster-based server systems were implemented as application-space programs running on commodity hardware in contrast to other, similar, solutions which require specialized hardware/software. We point out the advantages and disadvantages of both systems. We also identify when servers should be clustered and when clustering will not improve performance.
Using computational grid capabilities to enhance the capability of an X‐ray source for structural biologyvon Laszewski, Gregor; Westbrook, Mary; Barnes, Craig; Foster, Ian; Westbrook, Edwin
doi: 10.1023/A:1019036421819pmid: N/A
The Advanced Photon Source at Argonne National Laboratory enables structural biologists to perform state-of-the-art crystallography diffraction experiments with high-intensity X-rays. The data gathered during such experiments is used to determine the molecular structure of macromolecules to enhance, for example, the capabilities of modern drug design for basic and applied research. The steps involved in obtaining a complete structure are computationally intensive and require the proper adjustment of a considerable number of parameters that are not known a priori. Thus, it is advantageous to develop a computational infrastructure for solving the numerically complex problems quickly, in order to enable quasi-real-time information discovery and computational steering. Specifically, we propose that the time-consuming calculations be performed in a “computational grid” accessing a large number of state-of-the-art computational facilities. Furthermore, we envision that experiments could be conducted by researchers at their home institution via remote steering while a beamline technician performs the actual experiment; such an approach would be cost-efficient for the user. We conducted a case study involving multiple tasks of a structural biologist, including data acquisition, data reduction, solution of the phase problem, and calculation of the final result - an electron density map, which is subsequently used for modeling of the molecular structure. We developed a parallel program for the data reduction phase that reduces the turnaround time significantly. We also distributed the solution of the phase problem in order to obtain the resulting electron density map more quickly. We used the GUSTO testbed provided by the Globus metacomputing project as the source of the necessary state-of-the-art computational resources, including workstation clusters.
Mixed data and task parallelism with HPF and PVMOrlando, Salvatore; Palmerini, Paolo; Perego, Raffaele
doi: 10.1023/A:1019088405889pmid: N/A
We present a framework to design efficient and portable HPF applications which exploit a mixture of task and data parallelism. According to the framework proposed, data parallelism is restricted within HPF modules, and task parallelism is achieved by the concurrent execution of several data-parallel modules cooperating through COLTHPF, a coordination layer implemented on top of PVM. COLTHPF can be used independently of the HPF compilation system exploited, and it allows instances of cooperating HPF tasks to be created either statically or at run-time. We claim that COLTHPF can be exploited by means of a simple skeleton-based coordination language and associated compiler to easily express mixed data and task parallel applications runnable on either multicomputers or cluster of workstations. We used a physics application as a test case of our approach for mixing task and data parallelism, and we present the results of several experiments conducted on a cluster of Linux SMPs.
Dynamic Max‐Min fairness in ring networksAnastasi, G.; Lenzini, L.; La Porta, M.; Ofek, Y.
doi: 10.1023/A:1019040522727pmid: N/A
Ring networks are enjoying renewed interest as Storage Area Networks (SANs), i.e., networks for interconnecting storage devices (e.g., disk, disk arrays and tape drives) and storage data clients. This paper addresses the problem of fairness in ring networks with spatial reuse operating under dynamic traffic scenarios. To this end, in the first part of the paper the Max-Min fairness definition is extended to dynamic traffic scenarios and an algorithm for computing Max-Min fair rates in a dynamic environment is introduced. In the second part of the paper the extended Max-Min fairness definition is used as a measure to compare the performance in dynamic conditions of three fairness algorithms proposed for ring-based SANs. These algorithms are characterized by different fairness cycle sizes (number of links involved in each instance of the fairness algorithm), i.e., different complexity. The results show that the performance increases as the fairness cycle size decreases. In particular, the Global-cycle algorithm (implemented in the Serial Storage Architecture - SSA), whose cycle size is equal to the number N of links in the ring, exhibits the lowest performance, while the One-cycle algorithm, so called because of its cycle size equal to 1, has the best performance. The Variable-cycle algorithm, whose cycle size changes between 1 and N links, performs in between and provides the best tradeoff between performance and complexity.
Concurrent single stepping in event‐visualization toolsKunz, Thomas; Khouzam, Marc
doi: 10.1023/A:1019092506798pmid: N/A
Event visualization tools are commonly used to facilitate the debugging of parallel or distributed applications, but they are insufficient for full debugging purposes. The need for traditional debugging operations, such as single stepping, is often overlooked in these tools. When integrating such operations, the issue of concurrency needs to be addressed. This paper justifies and describes three single-stepping operations that we found suitable for partially-ordered executions: global-step, step-over and step-in. The description of these operations is based on a sound theoretical framework. This framework can serve as a basis to extend the operations to deal with specific properties of event visualization tools. For example, abstraction techniques are often used to reduce the overwhelming amount of detail presented to the user when visualizing non-trivial executions. These abstraction operations introduce additional problems for single stepping. The paper discusses the problems induced by two different abstraction operations in the context of a specific event visualization tool, Poet, and describes how the single-stepping operations are adapted to deal with these problems.