A logic of authenticationBurrows, M.; Abadi, M.; Needham, R.
doi: 10.1145/74851.74852pmid: N/A
Authentication protocols are the basis of security in many distributed systems, and it is therefore essential to ensure that these protocols function correctly. Unfortunately, their design has been extremely error prone. Most of the protocols found in the literature contain redundancies or security flaws.A simple logic has allowed us to describe the beliefs of trustworthy parties involved in authentication protocols and the evolution of these beliefs as a consequence of communication. We have been able to explain a variety of authentication protocols formally, to discover subtleties and errors in them, and to suggest improvements. In this paper, we present the logic and then give the results of our analysis of four published protocols, chosen either because of their practical importance or because they serve to illustrate our method.
Reducing risks from poorly chosen keysLomas, T.; Gong, L.; Saltzer, J.; Needhamn, R.
doi: 10.1145/74851.74853pmid: N/A
It is well-known that, left to themselves, people will choose passwords that can be rather readily guessed. If this is done, they are usually vulnerable to an attack based on copying the content of messages forming part of an authentication protocol and experimenting, e.g. with a dictionary, offline. The most usual counter to this threat is to require people to use passwords which are obscure, or even to insist on the system choosing their passwords for them. In this paper we show alternatively how to construct an authentication protocol in which offline experimentation is impracticable; any attack based on experiment must involve the real authentication server and is thus open to detection by the server noticing multiple attempts.
Simple but effective techniques for NUMA memory managementBolosky, W.; Fitzgerald, R.; Scott, M.
doi: 10.1145/74851.74854pmid: N/A
Multiprocessors with non-uniform memory access times introduce the problem of placing data near the processes that use them, in order to improve performance. We have implemented an automatic page placement strategy in the Mach operating system on the IBM ACE multiprocessor workstation. Our experience indicates that even very simple automatic strategies can produce nearly optimal page placement. It also suggests that the greatest leverage for further performance improvement lies in reducing false sharing, which occurs when the same page contains objects that would best be placed in different memories.
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinumCox, A.; Fowler, R.
doi: 10.1145/74851.74855pmid: N/A
PLATINUM is an operating system kernel with a novel memory management system for Non-Uniform Memory Access (NUMA) multiprocessor architectures. This memory management system implements a coherent memory abstraction. Coherent memory is uniformly accessible from all processors in the system. When used by applications coded with appropriate programming styles it appears to be nearly as fast as local physical memory and it reduces memory contention. Coherent memory makes programming NUMA multiprocessors easier for the user while attaining a level of performance comparable with hand-tuned programs.This paper describes the design and implementation of the PLATINUM memory management system, emphasizing the coherent memory. We measure the cost of basic operations implementing the coherent memory. We also measure the performance of a set of application programs running on PLATINUM. Finally, we comment on the interaction between architecture and the coherent memory system.PLATINUM currently runs on the BBN Butterfly Plus Multiprocessor.
Spritely NFS: experiments with cache-consistency protocolsSrinivasan, V.; Mogul, J.
doi: 10.1145/74851.74856pmid: N/A
File caching is essential to good performance in a distributed system, especially as processor speeds and memory sizes continue to improve rapidly while disk latencies do not. Stateless-server systems, such as NFS, cannot properly manage client file caches. Stateful systems, such as Sprite, can use explicit cache consistency protocols to improve both cache consistency and overall performance.By modifying NFS to use the Sprite cache consistency protocols, we isolate the effects of the consistency mechanism from the other features of Sprite. We find dramatic improvements on some, although not all, benchmarks, suggesting that an explicit cache consistency protocol is necessary for both correctness and good performance.
Exploiting read-mostly workloads in the FileNet file systemEdwards, D.; Mckendry, M.
doi: 10.1145/74851.74857pmid: N/A
Most recent studies of file system workloads have focussed on loads imposed by general computing. This paper introduces a significantly different workload imposed by a distributed application system. The FileNet system is a distributed application system that supports document image processing. The FileNet file system was designed to support the workload imposed by this application. To characterize the read-mostly workload applied to the file system and how it differs from general computing environments, we present statistics gathered from live production installations. We contrast these statistics with previously published data for more general computing.We describe the key algorithms of the file system, focusing on the caching approach. A bimodal client caching approach is employed, to match the file modification patterns observed. Different cache consistency algorithms are used depending on usage patterns observed for each file. Under most conditions, files cached at workstations can be accessed without contacting servers. When a file is subject to frequent modification that causes excessive cache consistency traffic, caching is disabled for that file, and servers participate in all open and close activities.The data from production sites is examined to evaluate the success of the approach under its applied load. Contrasts with alternative approaches are made based on this data.
Improving the efficiency of UNIX buffer cachesBraunstein, A.; Riley, M.; Wilkes, J.
doi: 10.1145/74851.74858pmid: N/A
This paper reports on the effects of using hardware virtual memory assists in managing file buffer caches in UNIX. A controlled experimental environment was constructed from two systems whose only difference was that one of them (XMF) used the virtual memory hardware to assist file buffer cache search and retrieval. An extensive series of performance characterizations was used to study the effects of varying the buffer cache size (from 3 Megabytes to 70 MB); I\O transfer sizes (from 4 bytes to 64 KB); cache-resident and non-cache-resident data; READs and WRITEs; and a range of application programs.The results: small READ/WRITE transfers from the cache (1 KB) were 5O% faster under XMF, while larger transfers (8 KB) were 20% faster. Retrieving data from disk, the XMF improvement was 25% and 1O% respectively, although OPEN/CLOSE system calls took slightly longer in XMF. Some individual programs ran as much as 40% faster on XMF, while an application benchmark suite showed a 7-15% improvement in overall execution time. Perhaps surprisingly. XMF had fewer translation lookaside buffer misses.
Performance of Firefly RPCSchroeder, M.; Burrows, M.
doi: 10.1145/74851.74859pmid: N/A
In this paper, we report on the performance of the remote procedure call implementation for the Firefly multiprocessor and analyze the implementation to account precisely for all measured latency. From the analysis and measurements, we estimate how much faster RPC could be if certain improvements were made.The elapsed time for an inter-machine call to a remote procedure that accepts no arguments and produces no results is 2.66 milliseconds. The elapsed time for an RPC that has a single 1440-byte result (the maximum result that will fit in a single packet) is 6.35 milliseconds. Maximum inter-machine throughput of application program data using RPC is 4.65 megabits/second, achieved with 4 threads making parallel RPCs that return the maximum sized result that fits in a single RPC result packet. CPU utilization at maximum throughput is about 1.2 CPU seconds per second on the calling machine and a little less on the server.These measurements are for RPCs from user space on one machine to user space on another, using the installed system and a 10 megabit/second Ethernet. The RPC packet exchange protocol is built on IP/UDP, and the times include calculating and verifying UDP checksums. The Fireflies used in the tests had 5 Micro VAX II processors and a DEQNA Ethernet controller.
RPC in the x-Kernel: evaluating new design techniquesPeterson, L.; Hutchinson, N.; O'Malley, S.; Abbott, M.
doi: 10.1145/74851.74860pmid: N/A
This paper reports our experiences implementing remote procedure call (RPC) protocols in the x-kernel. This exercise is interesting because the RPC protocols exploit two novel design techniques: virtual protocols and layered protocols. These techniques are made possible because the x-kernel provides an object-oriented infrastructure that supports three significant features: a uniform interface to all protocols, a late binding between protocol layers, and a small overhead for invoking any given protocol layer. For each design technique, the paper motivates the technique with a concrete example, describes how it is applied to the implementation of RPC protocols, and presents the results of experiments designed to evaluate the technique.
Lightweight remote procedure callBershad, B.; Anderson, T.; Lazowska, E.; Levy, H.
doi: 10.1145/74851.74861pmid: N/A
Lightweight Remote Procedure Call (LRPC) is a communication facility designed and optimized for communication between protection domains on the same machine.In contemporary small-kernel operating systems, existing RPC systems incur an unnecessarily high cost when used for the type of communication that predominates between protection domains on the same machine. This cost leads system designers to coalesce weakly-related subsystems into the same protection domain, trading safety for performance. By reducing the overhead of same-machine communication, LRPC encourages both safety and performance.LRPC combines the control transfer and communication model of capability systems with the programming semantics and large-grained protection model of RPC. LRPC achieves a factor of three performance improvement over more traditional approaches based on independent threads exchanging messages, reducing the cost of same-machine communication to nearly the lower bound imposed by conventional hardware.LRPC has been integrated into the Taos operating system of the DEC SRC Firefly multiprocessor workstation.