In 2, we proposed the load balanced Birkhoff-von Neumann switch with one-stage buffering (see Figure 1). Such a switch consists of two stages of crossbar switching fabrics and one stage of buffering. The buffer at the input port of the second stage uses the Virtual Output Queueing (VOQ) technique to solve the problem of head-of-line blocking. In such a switch, packets are of the same size. Also, time is slotted and synchronized so that exactly one packet can be transmitted within a time slot. In a time slot, both crossbar switches set up connection patterns corresponding to permutation matrices that are periodically generated from a one-cycle permutation matrix. The reasoning behind such a switch architecture is as follows: since the connection patterns are periodic, packets from the same input port of the first stage are distributed in a round-robin fashion to the second stage according to their arrival times. Thus, the first stage performs load balancing for the incoming traffic. As the traffic coming into the second stage is load balanced, it suffices to use simple periodic connection patterns to perform switching at the second stage. This is shown in 2 as a special case of the original Birkhoff-von Neumann decomposition used in 1. There are several advantages of using such an architecture, including scalability, low hardware complexity, 100% throughput, low average delay in heavy load and bursty traffic, and efficient buffer usage. However, the main drawback of the load balanced Birkhoff-von Neumann switch with one-stage buffering is that packets might be out of sequence. The main objective of this paper is to solve the out-of-sequence problem that occurs in the load balanced Birkhoff-von Neumann switch with one-stage buffering. One quick fix is to add a resequencing-and-output buffer after the second stage. However, as packets are distributed according to their arrival times at the first stage, there is no guarantee on the size of the resequencing-and-output buffer to prevent packet losses. For this, one needs to distributed packets according to their flows, as indicated in the paper by Iyer and McKeown 5. This is done by adding a flow splitter and a load-balancing buffer in front of the first stage (see Figure 2). For an N x N switch, the load-balancing buffer at each input port of the first stage consists of N virtual output queues (VOQ) destined for the N output ports of that stage. Packets form the same flow are split in the round-robin fashion to the N virtual output queues and scheduled under the First Come First Served (FCFS) policy. By so doing, load balancing can be achieved for each flow as packets from the same flow are split almost evenly to the input ports of the second stage. More importantly, as pointed out in 5, the delay and the buffer size of the load-balancing buffer are bounded by constants that only depend on the size of the switch and the number of flows. The resequencing-and-output buffer after the second stage not only performs resequencing to keep packets in sequence, but also stores packets waiting for transmission from the output links. In this paper, we consider a traffic model with multicasting flows. This is a more general model than the point-to-point traffic model in 5. A multicasting flow is stream of packets that has one common input and a set of common outputs. For the multicasting flows, fanout splitting (see e.g., 4) is performed at the central buffers (the VOQ in front of the second stage). The central buffers are assumed to be infinite so that no packets are lost in the switch. We consider two types of scheduling policies in the central buffers: the FCFS policy and the Earliest Deadline First (EDF) policy. For the FCFS policy, a jitter control mechanism, is added in the VOQ in front of the second stage. Such a jitter control mechanism delays every packet to its maximum delay at the first stage so that the flows entering the second stage are simply time-shifted flows of the original ones. Our main result for the FCFS scheme with jitter controls is the following theorem. The proof of Theorem 1 is shown in the full report 3. Theorem 1 Suppose that all the buffers are empty at time 0. Then the followings hold for FCFS scheme with jitter control. (i) The end-to-end delay for a packet through our switch with multi-stage buffering is bounded above by the sum of the delay through the corresponding FCFS output-buffered switch and N L max + ( N + 1) M max , where L max (resp. M max ) is the maximum number of flows at an input (resp. output) port. (ii) The load-balancing buffer at an input port of the first stage is bounded above by N L max . (iii) The delay through the load-balancing buffer at an input port of the first stage is bounded above by N L max . (iv) The resequencing-and-output buffer at an output port of the second stage is bounded above ( N + 1) M max . In the EDF scheme (see Figure 3), every packet is assigned a deadline that is the departure time from the corresponding FCFS output-buffered switch. Packets are scheduled according to their deadlines in the central buffers. For the EDF scheme, there is no need to implement the jitter control mechanism in the FCFS scheme. As such, average packet delay can be greatly reduced. However, as there is no jitter control, one might need a larger resequencing buffer than that in the FCFS scheme with jitter control. Since the first stage is the same as that in the FCFS scheme, both the delay and the buffer size of the load-balancing buffer are still bounded by N L max . Moreover, we show the following theorem for the EDF scheme. Its proof is given in the full report 3. Theorem 2 Suppose that all the buffers are empty at time 0. Then the followings hold for the EDF scheme. (i) The end-to-end delay for a packet through our switch with multi-stage buffering is bounded above by the sum of the delay through the corresponding FCFS output-buffered switch and N ( L max + M max ). (ii) The resequencing-and-output buffer at an output port of the second stage is bounded above N ( L max + M max ). Computing the departure times from the corresponding FCFS output-buffered switch needs global information of all the inputs. A simple way is to use the packet arrival times as deadlines. Then the EDF scheme based on arrival times yields the same departure order except those packets that arrives at same time. Since there are at most M max packets that can arrive at the same time to an output port of the corresponding output-buffered switch, the end-to-end delay for a packet through the multi-stage switch using arrival times as deadlines is bounded above by the sum of the delay through the corresponding FCFS output-buffered switch and N L max +( N +1) M max . Also, the resequencing-and-output buffer at an output port of the second stage in this case is bounded above N L max + ( N + 1) M max .
/lp/association-for-computing-machinery/load-balanced-birkhoff-von-neumann-switches-with-resequencing-z5Tue1TFxY