Abstract Flexible network management requires explicit control of the exact paths taken by different network flows. Whatever the way this endeavour is achieved (e.g. Multiprotocol Label Switching, Virtual Local Area Networks and OpenFlow), this need may lead to an explosion of entries in the forwarding tables of network equipment. In this article, we present an algorithm that aggregates many network paths in a reduced number of trees, thus allowing shrinking the forwarding state in switching devices. Path aggregation algorithms are often deployed to reduce data-plane state with different routing approaches and the presented algorithm achieves better results than existent algorithms with similar goals. Additionally, we show that most types of popular routing and switching equipment, even using off-the-shelf routing software, may be used to implement multi-path routing with trees. This highlights the applicability of the proposed algorithm and its significance in light of the current trend of separating the data- and the control-planes in modern networks. 1. INTRODUCTION Simple and sound engineering principles have been at the heart of the Internet success. Nevertheless, for the last 20 years, Internet’s core routing technologies have been challenged by several new requirements like huge scale, service diversity, performance optimization (e.g. load balancing and traffic engineering), traffic steering and security (e.g. traffic may be rerouted to middle boxes in charge of its analysis and control) and fault tolerance. These new requirements had a significant impact on backbone networks complexity. However, simplicity, stability and flexibility of routing control and of routing devices are still mandatory key ingredients of successful network operations. Routing can be based on shortest-path routing [1, 2], a simple and popular strategy that falls short when the goal is a close and flexible control of different packet flow paths and network optimization. However, it supports load balancing and multi-path routing in data centres [3]. In enterprise networks, Spanning Tree Protocol (SPT) and Virtual Local Area Networks (VLANs) are popular suboptimal solutions, which also support isolation and multi-path routing in data centres [4]. When the goal is performance optimization, one may resort to traffic engineering and load-balancing across several paths [5]. Security may also require a strict control of the paths followed by some flows. Reliability demands availability of alternative paths. Due to the popularity of Multiprotocol Label Switching (MPLS) to support customer Virtual Private Networks (VPNs) and its ability to support from fine-grained traffic controls to carrier grade requirements, the most popular solution adopted in complex backbones makes use of multi-path routing mechanisms implemented by the way of Label Switching Paths (LSPs) [6] or other types of tunnels (e.g. Generic Routing Encapsulation - GRE [7]). The need for traffic optimization and strict control is now common to several types of networks besides traditional Internet Service Provider (ISP) networks, like intra [3, 4, 8] and inter [9, 10] data centre networks. Scale and complexity have been continuously growing. This state of affairs needs simpler and more effective engineering solutions, as well as core routers shielded from the complexity of dynamic route (re)computation as proposed in [11, 12]. This is the direction taken by the Software Defined Networking (SDN) approach, which aims at ‘separating routing from routers’ [13–15] to allow a logically centralized and more flexible control of the way different packet flows are routed. Some recent publications report traffic engineering and control experiences, e.g. [9, 10, 16, 17], that go in that direction. All share part or most of the following tenets: Keep core routers and switches as simple as possible and concentrated on forwarding; their main purpose is to make available the different needed paths; when faults occur, they only report their occurrence and let the edge (e.g. edge routers, controllers or servers in data centres) adapt to the new situation; Concentrate any required in-network complexity in edge devices; they are in charge of load-balancing incoming flows across the available paths; their flow distribution policies may be precomputed [16] or managed by logically centralized controllers [9, 10]; Execute traffic optimization and other control algorithms offline [16] or by some logically centralized network controller [9, 10] that is aware of the global network status and traffic demands. Even when an SDN approach is used to steer traffic routing, in large networks, due to scalability issues, packet flows are not individually routed [15]. Instead, these flows are partitioned into sets, sometimes called trunks, and routed through tunnels. This approach is frequent in large backbones and interconnection networks. In the rest of the paper, the terms flow, path and tunnel will be used interchangeably. In spite of the diversity of the requirements, the selected number of paths is very important in these networks, since in an n ingress/egress node network, using on average k different paths for each edge node pair results in O(kn2) paths. As it is unfeasible to switch to new approaches by swapping overnight all network equipment and the skills of their human operators, the set-up of these paths should be possible with off-the-shelf equipment and known protocols. Besides, there is a common requirement to those different backbones ecosystems: to reduce as much as possible the data-plane complexity of the core switching equipment. Thus, whatever the technology used to control the backbone (traditional shortest-path protocols, SPT and VLANs, MPLS or other types of tunnels, OpenFlow, etc.), data-plane complexity reduction promotes making use of cheaper equipment. In this paper, we present an algorithm, called BOUQUET, for aggregating a large number of paths in a reduced number of trees. BOUQUET outperforms similar algorithms presented in the literature and has been extensively tested and compared by making use of several sorts of networks. We also show that most types of popular routing and switching equipment, even using off-the-shelf routing software, may be used to implement multi-path routing with trees. This highlights the applicability of the novel algorithm and its significance in light of the current trend of taking a path leading to the management of networks based on the separation of the data- and the control-planes. In Section 2, the problem is formalized and some known solutions are described. Then, BOUQUET is introduced and analysed in Section 3. Since BOUQUET is a polynomial-time algorithm for an NP-hard problem, it requires a careful evaluation, presented in Section 4. In Section 5, we discuss how multi-path routing making use of trees may be implemented with off-the-shelf equipment and how it can be used in the transition to SDN. Finally, we conclude the article in Section 6. 2. MINIMUM TREE PACKING PROBLEM AND SOME KNOWN SOLUTIONS Consider a network modelled by a graph, G=(V,E), that is undirected (so (v,w) and (w,v) denote the same edge), simple (there are no loops nor parallel edges), connected (there is some path between any two nodes), and weighted, where each edge has a positive weight. Additionally, let N⊆V be the set of edge nodes originating and terminating traffic and n=∣N∣. For every pair (x,y)∈N2, with x<y (for any total order in N), it is necessary to previously compute (around) k>0 distinct simple (or loop-free) paths from x to y in order to support multi-path routing. The set S of all these paths has size ∣S∣≈kn(n−1)2. These paths can be aggregated in a reduced set of trees covering them. A tree t is an undirected, simple and connected graph that is acyclic,1 and covers a path p if p is a path in t. Figure 1 depicts three undirected, simple and connected graphs. Network G¯ has six nodes and nine edges whose weights are all 1. The set of ingress/egress nodes is N={1,3,4,5,6}. Both subgraphs ( t¯1 and g¯c) cover paths 4 5 and 5 4 2 6, but do not cover 4 5 6. FIGURE 1. View largeDownload slide G¯ is a network whose links have all weight 1 and whose set of ingress/egress nodes is N={1,3,4,5,6}. t¯1 is a tree because it is connected and acyclic. g¯c is not a tree because it has cycles (e.g. path 4 5 2 4 is a cycle). FIGURE 1. View largeDownload slide G¯ is a network whose links have all weight 1 and whose set of ingress/egress nodes is N={1,3,4,5,6}. t¯1 is a tree because it is connected and acyclic. g¯c is not a tree because it has cycles (e.g. path 4 5 2 4 is a cycle). Multi-path routing for traffic engineering requires the usage of a very significant number of different paths, of order kn2. Moreover, different classes of traffic and other requirements may still increase path numbers. If one wants to avoid complex and dynamic routing algorithms in the core of the network, these paths must be mostly pre-configured, and reducing the state required to set them up (i.e. the size of the Forwarding Information Base—FIB—in routers) is an important goal. As it will be argued in Section 5, using trees for this aggregation is one of the best solutions. Unfortunately, determining the minimum number of trees needed to cover a set of paths is an NP-hard problem. Definition 2.1 (Minimum Tree Packing Problem (min-TP)) Given a set Sof simple paths in an undirected, simple and connected graph G, compute a set Tof trees, with minimum size, such that each path in Sis a path in some tree of T. Deciding if a set of paths with the same destination node can be aggregated into m≥1 undirected and acyclic graphs is an NP-complete problem [18]. But any undirected and acyclic graph that covers one or more such paths is a tree (unless it has superfluous connected components, which may be removed), because all paths share the destination node. Therefore, min-TP is NP-hard, even restricting the input to a set of paths with the same destination, and the known polynomial-time algorithms for solving it do not guarantee that the computed set of trees has minimum size. The size of the returned set is the main metric to evaluate the quality of these algorithms. The restricted version of the problem (restrict-min-TP), where all paths have the same destination, has been studied in the context of computing MPLS multipoint-to-point LSPs. In [19], it is formulated as a 0–1 integer linear programming problem. A greedy algorithm is proposed in [20], which basically aggregates paths (and trees) in decreasing order of the length of their longest ‘common suffixes’. Although min-TP could be tackled with those algorithms, the final number of trees would be far from a reduced one. In general, S has k paths for each pair (x,y)∈N2, with x<y. Thus, ∣S∣≈kn(n−1)2, where n=∣N∣. Partitioning paths in S by the destination node would give rise to n−1 instances of restrict-min-TP and, for each one, the minimum number of computed trees would be k because each of the k paths with the same origin (and destination) would have to be covered by a different tree. Therefore, the final total number of trees would be at least (n−1)k, repetitions being non expected. This strategy is dubbed divide-by-destination in Section 4.2. In the context of system SPAIN [4, 18], whose goals are similar to ours, Mudigonda et al. developed two randomized algorithms for aggregating paths into acyclic subgraphs (not necessarily connected). Due to their randomized nature, both algorithms must be executed several times, being returned a computed set of subgraphs with minimum size. For the sake of preciseness, let us consider that a path p=v1v2⋯vm (with m≥2) induces the undirected graph Gp=(Vp,Ep), where Vp={v1,v2,…,vm} and Ep={(v1,v2),…,(vm−1,vm)}, i.e. Gp has the nodes and the edges in p. Now, let G′=(V′,E′) be an acyclic subgraph of G and p∈S. G′ is said to contain or cover p if Ep⊆E′, and p is aggregable into G′ if (V′∪Vp,E′∪Ep) is an acyclic graph. The insertion of p into G′ transforms G′ into the graph (V′∪Vp,E′∪Ep). In our example (see Fig. 2), tree t¯1 is the graph induced by path p¯1=5 4 2 6 ( Gp¯1=t¯1). Besides, path p¯2=4 2 5 is not aggregable into t¯1, because the insertion of edge (2,5) gives rise to cycles (the union of t¯1 and Gp¯2 is g¯c). Paths p¯3=1 3 and p¯4=3 1 4 5 can be aggregated into t¯1 since the corresponding insertions result in acyclic graphs (respectively, g¯ and t¯2). FIGURE 2. View largeDownload slide Tree t¯1 is the graph induced by path p¯1=5 4 2 6. Path p¯2=4 2 5 is not aggregable into t¯1 because the union of t¯1 and Gp¯2 is (the cyclic) graph g¯c. The disconnected graph g¯ and tree t¯2 are, respectively, the result of inserting into tree t¯1 paths p¯3=1 3 and p¯4=3 1 4 5. FIGURE 2. View largeDownload slide Tree t¯1 is the graph induced by path p¯1=5 4 2 6. Path p¯2=4 2 5 is not aggregable into t¯1 because the union of t¯1 and Gp¯2 is (the cyclic) graph g¯c. The disconnected graph g¯ and tree t¯2 are, respectively, the result of inserting into tree t¯1 paths p¯3=1 3 and p¯4=3 1 4 5. The first algorithm (SP1) is quite simple [4, 18]. Initially, the set R of subgraphs is empty. Set S is traversed randomly and each of its paths, p, is treated sequentially, by testing if it is covered by some graph in R. If that is the case, p is skipped; otherwise, R is traversed again, in a random order, up to find a graph G′ into which p is aggregable and p is inserted into G′. When no such graph is found, graph Gp becomes a new member of R. The second algorithm (SP2) is much more complex [18] and its full description is outside the scope of this paper. Essentially, the algorithm has two phases. In the first one, S is partitioned by the path destination node and each subproblem is solved by a reduction to the vertex colouring problem. Since the union of all computed sets (which is a solution of the original problem) can have a large number of subgraphs, the second phase tries to merge subgraphs so as to reduce the size of the returned set. This merging step starts by computing a randomized permutation of all subgraphs, because the order of the input sequence affects the solution size. Although the motivation presented by the authors to design SP2 is the parallel execution of the first phase (due to the independence of the subproblems), the paper does not contain any comparison of both algorithms from the point of view of performance or the quality of the solutions. 3. THE BOUQUET ALGORITHM BOUQUET is a greedy algorithm. Starting from an empty set of trees, paths are successively aggregated into the current trees, creating a new tree only when their insertion into any existing tree would result in a cyclic or disconnected graph. The prime difference to the similar algorithms presented above is that ‘pairs of compatible paths’ are first inserted in a specific order and into specific trees. The key notion of ‘compatibility’ is defined over two paths, over a path and a tree, and over a pair of paths and a tree. Compatibility of two paths will be used to specify the order in which path pairs are processed. The two other compatibility types will be used to identify the tree where a path or a pair of paths is inserted, when there are several alternatives. Recall that: the input set, S, is a set of paths in a graph G; the graph induced by a path p∈S is denoted by Gp=(Vp,Ep); and a tree t of G is a connected and acyclic subgraph of G, represented by (Vt,Et). Definition 3.1 (Compatibility Degree) The compatibility degree of two paths pand q, δ(p,q), is −1, if graph (Vp∪Vq,Ep∪Eq)is cyclic; otherwise, is ∣Vp∩Vq∣, the number of common nodes. The compatibility degree of a path pand a tree t, δ(p,t), is defined in an identical way: it is −1, if graph (Vt∪Vp,Et∪Ep)is cyclic; and ∣Vt∩Vp∣, otherwise. The compatibility degree of a pair of paths (p,q)and a tree t, δ((p,q),t), is: −1, if graph (Vt∪Vp∪Vq,Et∪Ep∪Eq)is cyclic; and ∣Vt∩Vp∣+∣Vt∩Vq∣, otherwise. Recall Fig. 2, paths p¯1, p¯2, p¯3, and p¯4 (also defined in Table 1), and the fact that Gp¯1=t¯1. Since the union of Gp¯1 and Gp¯2 is a cyclic graph ( g¯c), δ(p¯1,p¯2)=δ(p¯2,t¯1)=−1. Besides, the union of Gp¯1 and Gp¯3 and the union of Gp¯1 and Gp¯4 are acyclic graphs (respectively, g¯ and t¯2). So, δ(p¯1,p¯3)=δ(p¯3,t¯1)=0, because there is no node in common, and δ(p¯1,p¯4)=δ(p¯4,t¯1)=2, being 4 and 5 the shared nodes. TABLE 1. Lengths (Len), compatibility degrees ( δ) and aggregation potentials ( Φ) when the set of paths is S¯={p¯1,p¯2,p¯3,p¯4} (where ‘ −’ represents irrelevant). Path Len Φ p¯1 5 4 2 6 3 2 p¯2 4 2 5 2 0 p¯3 1 3 1 2 p¯4 3 1 4 5 3 4 Path Len Φ p¯1 5 4 2 6 3 2 p¯2 4 2 5 2 0 p¯3 1 3 1 2 p¯4 3 1 4 5 3 4 Path Pair δ Φ Len (p¯1,p¯2) −1 – – (p¯1,p¯3) 0 – – (p¯1,p¯4) 2 6 6 (p¯2,p¯3) 0 – – (p¯2,p¯4) −1 – – (p¯3,p¯4) 2 6 4 Path Pair δ Φ Len (p¯1,p¯2) −1 – – (p¯1,p¯3) 0 – – (p¯1,p¯4) 2 6 6 (p¯2,p¯3) 0 – – (p¯2,p¯4) −1 – – (p¯3,p¯4) 2 6 4 View Large TABLE 1. Lengths (Len), compatibility degrees ( δ) and aggregation potentials ( Φ) when the set of paths is S¯={p¯1,p¯2,p¯3,p¯4} (where ‘ −’ represents irrelevant). Path Len Φ p¯1 5 4 2 6 3 2 p¯2 4 2 5 2 0 p¯3 1 3 1 2 p¯4 3 1 4 5 3 4 Path Len Φ p¯1 5 4 2 6 3 2 p¯2 4 2 5 2 0 p¯3 1 3 1 2 p¯4 3 1 4 5 3 4 Path Pair δ Φ Len (p¯1,p¯2) −1 – – (p¯1,p¯3) 0 – – (p¯1,p¯4) 2 6 6 (p¯2,p¯3) 0 – – (p¯2,p¯4) −1 – – (p¯3,p¯4) 2 6 4 Path Pair δ Φ Len (p¯1,p¯2) −1 – – (p¯1,p¯3) 0 – – (p¯1,p¯4) 2 6 6 (p¯2,p¯3) 0 – – (p¯2,p¯4) −1 – – (p¯3,p¯4) 2 6 4 View Large Two paths (respectively, a path and a tree, or a pair of paths and a tree) are said to be compatible if their compatibility degree is positive. Note that two paths (respectively, a path and a tree) without common nodes are not compatible, because the union of the corresponding graphs is disconnected. However, a pair of paths can be compatible with a tree even if one of the paths does not share any node with the tree (providing the other does). The following properties, which justify the operations on compatible entities, are easy to verify: If (p,q) is a pair of compatible paths, the graph (Vp∪Vq,Ep∪Eq)created with (p,q) is a tree. If a path p and a tree t are compatible, the insertion of p into t, which transforms t into the graph (Vt∪Vp,Et∪Ep), yields a tree. If (p,q) is a pair of compatible paths, t is a tree, and (p,q) and t are compatible, the insertion of (p,q) into t, which transforms t into the graph (Vt∪Vp∪Vq,Et∪Ep∪Eq), yields a tree. Algorithm 1 BOUQUET Greedy Tree Packing 1: Input: S, the set of paths 2: X←compAndSortCompatPairs(S) 3: T←packCompatPairs(X) 4: Y←compAndSortSinglePaths(S) 5: T←packSinglePaths(T,Y) 6: return T 1: Input: S, the set of paths 2: X←compAndSortCompatPairs(S) 3: T←packCompatPairs(X) 4: Y←compAndSortSinglePaths(S) 5: T←packSinglePaths(T,Y) 6: return T Algorithm 1 BOUQUET Greedy Tree Packing 1: Input: S, the set of paths 2: X←compAndSortCompatPairs(S) 3: T←packCompatPairs(X) 4: Y←compAndSortSinglePaths(S) 5: T←packSinglePaths(T,Y) 6: return T 1: Input: S, the set of paths 2: X←compAndSortCompatPairs(S) 3: T←packCompatPairs(X) 4: Y←compAndSortSinglePaths(S) 5: T←packSinglePaths(T,Y) 6: return T BOUQUET comprises four main steps, summarized in Algorithm 1. Function compAndSortCompatPairs (outlined in Algorithm 2) returns all compatible pairs of paths sorted, firstly, in decreasing compatibility degree, secondly, in decreasing ‘aggregation potential’ and, in case of equal compatibility degrees and aggregation potentials, in descending order of ‘length of the pair’ (which is the sum of the lengths of the paths). The first criterion aims at privileging the pairs whose aggregation is more ‘natural’, i.e. those whose aggregation result differs less from each of the paths. The ‘aggregation potential’ gives priority to pairs of paths that share many common nodes with their compatible paths. Definition 3.2 (Aggregation Potential) The aggregation potential of a path pin S, Φ(p,S), is the sum of the compatibility degrees of all compatible pairs of Sencompassing p: Φ(p,S)=∑{q∈S∣q≠p∧δ(p,q)>0}δ(p,q).The aggregation potential of a pair of paths (p,q)in Sis the sum of the aggregation potentials of pand qin S: Φ((p,q),S)=Φ(p,S)+Φ(q,S). Table 1 presents the aggregation potentials when the set of paths is S¯={p¯1,p¯2,p¯3,p¯4}. Notice that Φ(p¯1,S¯)=δ(p¯1,p¯4)=2 because p¯1 belongs to just one compatible pair of paths, while Φ(p¯4,S¯)=δ(p¯1,p¯4)+δ(p¯3,p¯4)=4. Consequently, δ((p¯1,p¯4),S¯)=2+4. Finally, the goal of the third criterion is to treat the longest paths as soon as possible, when there are more alternatives, deferring those that, in principle, are easier to aggregate. That is why the output of compAndSortCompatPairs( S¯) is ((p¯1,p¯4)(p¯3,p¯4)). In function packCompatPairs (sketched in Algorithm 3), each pair of compatible paths is processed. The processing of a pair starts by verifying, for each path of the pair, whether it is covered by some of the existing trees (Lines 4–5). Three cases can arise. When both paths are contained in trees, the processing of the pair ends (Lines 6–7). When none of the paths is contained in a tree (Lines 8–14), if there is some tree compatible with the pair, the pair is inserted into an existing tree; otherwise, a new tree is created with the pair. When one of the paths is contained in some tree t and the other, q, is not covered by any tree (Lines 15–20), three situations can hold: If q is compatible with t, q is inserted into t (Lines 21–22); If q is not compatible with t but there is some tree compatible with q, q is inserted into an existing tree (Lines 23-27); Otherwise, no tree is compatible with q, and q is ignored (its aggregation is postponed). Algorithm 2 compAndSortCompatPairs 1: Input: S={p1,p2,…,pm}, the set of paths 2: X←∅ 3: for i←1to mdo 4: pi.pot←0 5: end for 6: for i←1to m−1do 7: for j←i+1to mdo 8: d←δ(pi,pj) 9: if d>0then 10: X←X∪{(pi,pj)} 11: (pi,pj).deg←d 12: pi.pot←pi.pot+d 13: pj.pot←pj.pot+d 14: end if 15: end for 16: end for 17: for all x=(p,q)∈Xdo 18: x.pot←p.pot+q.pot 19: x.len←length(p)+length(q) 20: end for 21: sort X in descending order of deg/pot/len 22: return X 1: Input: S={p1,p2,…,pm}, the set of paths 2: X←∅ 3: for i←1to mdo 4: pi.pot←0 5: end for 6: for i←1to m−1do 7: for j←i+1to mdo 8: d←δ(pi,pj) 9: if d>0then 10: X←X∪{(pi,pj)} 11: (pi,pj).deg←d 12: pi.pot←pi.pot+d 13: pj.pot←pj.pot+d 14: end if 15: end for 16: end for 17: for all x=(p,q)∈Xdo 18: x.pot←p.pot+q.pot 19: x.len←length(p)+length(q) 20: end for 21: sort X in descending order of deg/pot/len 22: return X Algorithm 2 compAndSortCompatPairs 1: Input: S={p1,p2,…,pm}, the set of paths 2: X←∅ 3: for i←1to mdo 4: pi.pot←0 5: end for 6: for i←1to m−1do 7: for j←i+1to mdo 8: d←δ(pi,pj) 9: if d>0then 10: X←X∪{(pi,pj)} 11: (pi,pj).deg←d 12: pi.pot←pi.pot+d 13: pj.pot←pj.pot+d 14: end if 15: end for 16: end for 17: for all x=(p,q)∈Xdo 18: x.pot←p.pot+q.pot 19: x.len←length(p)+length(q) 20: end for 21: sort X in descending order of deg/pot/len 22: return X 1: Input: S={p1,p2,…,pm}, the set of paths 2: X←∅ 3: for i←1to mdo 4: pi.pot←0 5: end for 6: for i←1to m−1do 7: for j←i+1to mdo 8: d←δ(pi,pj) 9: if d>0then 10: X←X∪{(pi,pj)} 11: (pi,pj).deg←d 12: pi.pot←pi.pot+d 13: pj.pot←pj.pot+d 14: end if 15: end for 16: end for 17: for all x=(p,q)∈Xdo 18: x.pot←p.pot+q.pot 19: x.len←length(p)+length(q) 20: end for 21: sort X in descending order of deg/pot/len 22: return X Algorithm 3 packCompatPairs 1: Input: X, the set of compatible pairs of paths 2: T←∅ 3: for all x=(p1,p2)∈Xdo 4: t1←findCoveringTree(T,p1) 5: t2←findCoveringTree(T,p2) 6: if t1≠nulland t2≠nullthen 7: {do nothing} 8: else if t1=nulland t2=nullthen 9: t←findCompatTreeWithPair(T,x) 10: if t≠nullthen 11: insertPair(x,t) 12: else 13: T←T∪{createTree(x)} 14: end if 15: else {1 path covered, 1 path not covered} 16: if t1=nullthen 17: q←p1,t←t2 18: else 19: q←p2,t←t1 20: end if 21: if areCompatible(q,t)then 22: insertPath(q,t) 23: else 24: tq←findCompatTreeWithPath(T,q) 25: if tq≠nullthen 26: insertPath(q,tq) 27: end if 28: end if 29: end if 30: end for 31: return T 1: Input: X, the set of compatible pairs of paths 2: T←∅ 3: for all x=(p1,p2)∈Xdo 4: t1←findCoveringTree(T,p1) 5: t2←findCoveringTree(T,p2) 6: if t1≠nulland t2≠nullthen 7: {do nothing} 8: else if t1=nulland t2=nullthen 9: t←findCompatTreeWithPair(T,x) 10: if t≠nullthen 11: insertPair(x,t) 12: else 13: T←T∪{createTree(x)} 14: end if 15: else {1 path covered, 1 path not covered} 16: if t1=nullthen 17: q←p1,t←t2 18: else 19: q←p2,t←t1 20: end if 21: if areCompatible(q,t)then 22: insertPath(q,t) 23: else 24: tq←findCompatTreeWithPath(T,q) 25: if tq≠nullthen 26: insertPath(q,tq) 27: end if 28: end if 29: end if 30: end for 31: return T Algorithm 3 packCompatPairs 1: Input: X, the set of compatible pairs of paths 2: T←∅ 3: for all x=(p1,p2)∈Xdo 4: t1←findCoveringTree(T,p1) 5: t2←findCoveringTree(T,p2) 6: if t1≠nulland t2≠nullthen 7: {do nothing} 8: else if t1=nulland t2=nullthen 9: t←findCompatTreeWithPair(T,x) 10: if t≠nullthen 11: insertPair(x,t) 12: else 13: T←T∪{createTree(x)} 14: end if 15: else {1 path covered, 1 path not covered} 16: if t1=nullthen 17: q←p1,t←t2 18: else 19: q←p2,t←t1 20: end if 21: if areCompatible(q,t)then 22: insertPath(q,t) 23: else 24: tq←findCompatTreeWithPath(T,q) 25: if tq≠nullthen 26: insertPath(q,tq) 27: end if 28: end if 29: end if 30: end for 31: return T 1: Input: X, the set of compatible pairs of paths 2: T←∅ 3: for all x=(p1,p2)∈Xdo 4: t1←findCoveringTree(T,p1) 5: t2←findCoveringTree(T,p2) 6: if t1≠nulland t2≠nullthen 7: {do nothing} 8: else if t1=nulland t2=nullthen 9: t←findCompatTreeWithPair(T,x) 10: if t≠nullthen 11: insertPair(x,t) 12: else 13: T←T∪{createTree(x)} 14: end if 15: else {1 path covered, 1 path not covered} 16: if t1=nullthen 17: q←p1,t←t2 18: else 19: q←p2,t←t1 20: end if 21: if areCompatible(q,t)then 22: insertPath(q,t) 23: else 24: tq←findCompatTreeWithPath(T,q) 25: if tq≠nullthen 26: insertPath(q,tq) 27: end if 28: end if 29: end if 30: end for 31: return T Assuming that a tree that covers a path is associated with the path in the obvious cases (when createTree, insertPair, insertPath and findCoveringTree are performed), function compAndSortSinglePaths iterates the set of paths, collects those that are not associated with any tree, and sorts them in decreasing order of length, to be processed by function packSinglePaths. The processing of a single path p also starts by searching for a tree that covers it. If one is found, nothing more is done. Otherwise, if p is compatible with some tree, p is inserted into an existing tree; in the remaining cases, tree Gp is created and added into the result set. The search for a compatible tree with a path or a pair of paths α (implemented in findCompatTreeWithPath and findCompatTreeWithPair) returns, in case of success, the tree t where α will be inserted. That tree is, among all those compatible with α, one that maximizes the compatibility degree, i.e. t verifies: δ(α,t)≥1and(∀t′∈T)δ(α,t)≥δ(α,t′). With input S¯, BOUQUET returns the trees drawn in Fig. 3. The processing of the first pair, (p¯1,p¯4), always ends by creating the first tree, t¯2. Then, since the two elements of (p¯3,p¯4) are covered by that tree, nothing more is done in function packCompatPairs. The third step returns one single path, p¯2, which, being incompatible with t¯2, gives rise to tree t¯3. FIGURE 3. View largeDownload slide Output of BOUQUET with input S¯. FIGURE 3. View largeDownload slide Output of BOUQUET with input S¯. The expected running time of the algorithm is O(∣S∣3×∣V∣), where V denotes the set of nodes of network G, due to the following. The compatibility degree of two entities can be computed in O(∣V∣) steps, because all nodes belong to V, the length of any path cannot exceed ∣V∣−1, and a tree has at most ∣V∣−1 edges. Therefore, function compAndSortCompatPairs is O(∣S∣2×(∣V∣+log∣S∣)), since all pairs of paths are analysed and compatible pairs are sorted. Function compAndSortSinglePaths is O(∣S∣×log∣S∣), due to sorting. In the two packing steps, all pairs of compatible paths and all single paths are processed. As checking if a path is contained in a tree takes O(∣V∣) average time (using a hash table), the processing of each pair or single path requires O(∣T∣×∣V∣) time on average, where T represents the final set of trees. Consequently, the total cost of these phases is O(∣S∣2×∣T∣×∣V∣) and the conclusion stems from the fact that ∣T∣≤∣S∣. In the following section, we turn to the evaluation of the results of BOUQUET when used with concrete networks and sets of paths. 4. EVALUATION OF BOUQUET 4.1. Networks and path sets Besides some particular cases, like some regular networks, given a network and a set of paths, an optimal solution to the corresponding min-TP instance is not known. Therefore, the performance evaluation of BOUQUET embraces: Experiments on synthetic regular networks and path sets for which optimal solutions are known; Experiments with known public backbones and specially selected sets of paths for which optimal solutions are unknown. We characterize below the path sets S used in the experiments. We call them interesting path sets and have computed one such set for each network, by selecting some paths for every pair (x,y)∈N2 such that x<y. The criteria used for path selection were load distribution and fault tolerance, since traffic steering for security often requires a small set of very specific paths. For more details, see [21]. 4.1.1. Synthetic regular networks Four synthetic regular networks configurations were chosen for testing purposes: full mesh, ring, hierarchical and folded clos. In all of them, the weight of any edge is 1, that is, all links have cost 1. In the sequel, P=n(n−1)2 is the number of pairs of ingress/egress nodes. In full mesh and ring networks, all nodes are origin and destination of traffic. An n-node full mesh is a clique (as illustrated in Fig. 4, on the left). Between any two distinct nodes, there is a shortest path (with cost 1) and n−2 paths of cost 2. These are the interesting paths, leading to a set S with (n−1)P paths. In a ring network each node has degree 2 and there are two (disjoint) simple paths to reach any other node (see Fig. 4, on the right). The set S of interesting paths is the set of all simple paths and has 2P elements. In both configurations, paths can be aggregated into n trees. FIGURE 4. View largeDownload slide Full mesh (left) and ring (right) networks. FIGURE 4. View largeDownload slide Full mesh (left) and ring (right) networks. In traditional small to medium data centre networks, hierarchical tree-like networks are commonly used (see Fig. 5, on the left). In these hierarchical networks, the interesting paths between any two leaf nodes (the only ones that belong to N) are the shortest ones and there are 22m−1 such paths, where m is the minimum number of levels that must be climbed to reach the destination node. All these paths are valley-free, in the sense that they ‘climb and descend only once’. FIGURE 5. View largeDownload slide Two-level hierarchical (left) and folded clos (right) networks. FIGURE 5. View largeDownload slide Two-level hierarchical (left) and folded clos (right) networks. A folded clos network is defined by two distinct layers, forming a bipartite graph, where each node of the lower layer (which corresponds to the set N) is directly connected to every node of the upper layer. Figure 5 (on the right) presents a folded clos network with six nodes in each layer. For each pair of nodes of the lower layer, the interesting paths are the shortest paths, which are valley-free and pairwise disjoint. Their number is equal to the size of the upper layer. Therefore, in a folded clos network with n nodes in each layer, there are nP interesting paths, which can be aggregated into n trees. Those networks are also traditionally used in data centres and have the interesting property of maximizing path disjointness among nodes, thus also maximize fault tolerance. The first six rows of Table 2 characterize the chosen regular synthetic networks. TABLE 2. Characteristics of the networks used in the evaluation of BOUQUET: number of nodes ( ∣V∣), number of edges ( ∣E∣), number of ingress/egress nodes ( n=∣N∣), number of interesting paths ( ∣S∣) and minimum number of trees to cover those paths ( min∣T∣, where ‘–’ stands for unknown). Network ∣V∣ ∣E∣ ∣N∣ ∣S∣ min∣T∣ Full mesh 12 66 12 726 12 Ring 12 12 12 132 12 Hierarch. 2 14 24 8 152 8 Hierarch. 3 30 56 16 2352 32 Fold. clos 6 12 36 6 90 6 Fold. clos 12 24 144 12 792 12 Abovenet 15 44 15 420 – ATT 35 68 35 2366 – B4 12 19 12 264 – Géant 32 49 32 1991 – NTT 27 63 27 1404 – Sprint 32 64 32 1984 – Tiscali 30 76 30 1740 – Network ∣V∣ ∣E∣ ∣N∣ ∣S∣ min∣T∣ Full mesh 12 66 12 726 12 Ring 12 12 12 132 12 Hierarch. 2 14 24 8 152 8 Hierarch. 3 30 56 16 2352 32 Fold. clos 6 12 36 6 90 6 Fold. clos 12 24 144 12 792 12 Abovenet 15 44 15 420 – ATT 35 68 35 2366 – B4 12 19 12 264 – Géant 32 49 32 1991 – NTT 27 63 27 1404 – Sprint 32 64 32 1984 – Tiscali 30 76 30 1740 – View Large TABLE 2. Characteristics of the networks used in the evaluation of BOUQUET: number of nodes ( ∣V∣), number of edges ( ∣E∣), number of ingress/egress nodes ( n=∣N∣), number of interesting paths ( ∣S∣) and minimum number of trees to cover those paths ( min∣T∣, where ‘–’ stands for unknown). Network ∣V∣ ∣E∣ ∣N∣ ∣S∣ min∣T∣ Full mesh 12 66 12 726 12 Ring 12 12 12 132 12 Hierarch. 2 14 24 8 152 8 Hierarch. 3 30 56 16 2352 32 Fold. clos 6 12 36 6 90 6 Fold. clos 12 24 144 12 792 12 Abovenet 15 44 15 420 – ATT 35 68 35 2366 – B4 12 19 12 264 – Géant 32 49 32 1991 – NTT 27 63 27 1404 – Sprint 32 64 32 1984 – Tiscali 30 76 30 1740 – Network ∣V∣ ∣E∣ ∣N∣ ∣S∣ min∣T∣ Full mesh 12 66 12 726 12 Ring 12 12 12 132 12 Hierarch. 2 14 24 8 152 8 Hierarch. 3 30 56 16 2352 32 Fold. clos 6 12 36 6 90 6 Fold. clos 12 24 144 12 792 12 Abovenet 15 44 15 420 – ATT 35 68 35 2366 – B4 12 19 12 264 – Géant 32 49 32 1991 – NTT 27 63 27 1404 – Sprint 32 64 32 1984 – Tiscali 30 76 30 1740 – View Large 4.1.2. Backbone networks Seven backbone networks were selected for testing purposes. Only point-of-presence (POP) level network models were used since routing policies are more relevant at this level than at specific routers level. Besides, using the routing concretizations presented in Section 5, several routers of the same POP can be collapsed in one logical one and Equal Cost Multi-Path Routing (ECMP) can be used to load balance traffic across several parallel links directly connecting two POPs. Moreover, public depicted backbones were preferred to avoid having to resort to synthesized randomized backbone topologies. An education and research European backbone (Géant), a worldwide backbone (NTT Communications) and the worldwide inter data centre backbone used in [9] (B4) were chosen since their configurations are publicly sketched. These three backbones were approximately mapped from their publicly presented diagrams. Additionally, we used four commercial ISP backbones mapped by the Rocketfuel project [22]: Abovenet and ATT (USA backbones), Sprint (worldwide backbone) and Tiscali (European backbone). In all those backbone networks, as the link capacities were, in general, not publicized, latency (approximately inferred from the geographic distance between the cities where the POPs are located) is used as the cost metric. This metric is closely related to latency inflation which is particularly relevant for selecting paths in wide area backbones. To speed up path sets computations, backbone graphs were (iteratively) shrunk by pruning them from nodes of degree 1, since these nodes do not introduce any extra diversity or path alternatives. The last seven rows of Table 2 present the characteristics of the retained backbone networks. The number of required paths and trees will increase when used to drive a real network routing configuration. Degree 1 nodes will be reinserted, which will increase the number of paths, but not the number of trees. Besides, provisioning of different classes of service to customers will also increase the number of paths and trees by a factor proportional to the number of different traffic classes used. However, all multi-path routing methods require the same increase in backbone state and control complexity if several service classes are used. Resource reservation for traffic classes (and therefore for different trees) is outside the scope of this paper. Anyway, in large backbones, shaping and admission control, when applied, are performed at the edge to relieve core routers of these concerns. To obtain an interesting path set for each of these backbones, we made use of an algorithm (described in [21]) which selects a set of paths between two distinct ingress/egress nodes (POPs). Notice that N=V in all these networks. The criteria used to compute a set of k paths from a node x to a node y maximize the inclusion of the shortest paths, aim at improving path disjointness (in terms of edges) while avoiding paths that increase latency and length (number of edges) over the shortest path ones more than given thresholds. In the general case, the size of the returned set is k, but it can be less than k, when there are not so many simple paths from x to y, or k+1, when an extra path is added to assure tolerance to a single edge failure. We set k=4, a number often referred to as adequate for a backbone, since greater values do not allow very significant improvements in traffic distribution optimality (see e.g. [9, 23]). For each backbone, the total number of interesting paths ( ∣S∣) is presented in Table 2, while the averages of hop and latency stretches of these paths over the shortest paths are in Table 3. The last three columns of Table 3 characterize path disjointness (in terms of edges), by presenting the percentage of pairs (x,y) for which the set of interesting paths from x to y has no disjoint paths, two disjoint paths and more than two (pairwise) disjoint paths. Remark that a pair without disjoint paths may not tolerate a single edge fault, whereas a pair with d≥2 disjoint paths tolerates any d−1 simultaneous edge faults. TABLE 3. Characteristics of the path sets used for the backbone networks. For every node pair, a stretch is the absolute value of the difference between the shortest paths average and the interesting paths average. Latency stretch is expressed in milliseconds, in the same referential used for defining link costs. Network Average hop stretch Average latency stretch % pairs which tolerate 0 1 2 or 3 Simultaneous edge faults Abovenet 1.098 4.92 0 57.1 42.9 ATT 1.209 5.00 16.1 61.7 22.2 B4 1.277 12.85 0 87.9 12.1 Géant 1.746 8.49 0 89.5 10.5 NTT 1.042 11.15 0 67.2 32.8 Sprint 1.254 6.49 0 66.1 33.9 Tiscali 0.809 1.69 0 53.1 46.9 Network Average hop stretch Average latency stretch % pairs which tolerate 0 1 2 or 3 Simultaneous edge faults Abovenet 1.098 4.92 0 57.1 42.9 ATT 1.209 5.00 16.1 61.7 22.2 B4 1.277 12.85 0 87.9 12.1 Géant 1.746 8.49 0 89.5 10.5 NTT 1.042 11.15 0 67.2 32.8 Sprint 1.254 6.49 0 66.1 33.9 Tiscali 0.809 1.69 0 53.1 46.9 View Large TABLE 3. Characteristics of the path sets used for the backbone networks. For every node pair, a stretch is the absolute value of the difference between the shortest paths average and the interesting paths average. Latency stretch is expressed in milliseconds, in the same referential used for defining link costs. Network Average hop stretch Average latency stretch % pairs which tolerate 0 1 2 or 3 Simultaneous edge faults Abovenet 1.098 4.92 0 57.1 42.9 ATT 1.209 5.00 16.1 61.7 22.2 B4 1.277 12.85 0 87.9 12.1 Géant 1.746 8.49 0 89.5 10.5 NTT 1.042 11.15 0 67.2 32.8 Sprint 1.254 6.49 0 66.1 33.9 Tiscali 0.809 1.69 0 53.1 46.9 Network Average hop stretch Average latency stretch % pairs which tolerate 0 1 2 or 3 Simultaneous edge faults Abovenet 1.098 4.92 0 57.1 42.9 ATT 1.209 5.00 16.1 61.7 22.2 B4 1.277 12.85 0 87.9 12.1 Géant 1.746 8.49 0 89.5 10.5 NTT 1.042 11.15 0 67.2 32.8 Sprint 1.254 6.49 0 66.1 33.9 Tiscali 0.809 1.69 0 53.1 46.9 View Large 4.2. Results of path aggregation Table 4 presents the results obtained with the path aggregation algorithms. The first columns identify the network, the total number of paths to be aggregated and, for the regular networks, the minimum number of trees required to cover them. The next three columns contain the number of trees built by BOUQUET and the number of graphs computed by the two algorithms proposed for system SPAIN (SP1 and SP2). It is worth noting that these algorithms have been implemented as described in Section 2 and by the authors of [4, 18]. Consequently, the output is a set of acyclic graphs, not necessarily connected. The last column has the minimum number of trees that would be obtained with the strategy divide-by-destination (which corresponds to (n−1)k, see the discussion in Section 2). TABLE 4. Results of the path aggregation algorithms: number of paths to be aggregated ( ∣S∣), minimum number of trees to cover them ( min∣T∣, where ‘–’ stands for unknown), number of trees built by BOUQUET (BQ), number of acyclic graphs computed by the SPAIN algorithms (SP1 and SP2), and number of trees with the strategy divide-by-destination (DD). Network ∣S∣ min∣T∣ BQ SP1 SP2 DD Full mesh 726 12 12 58 131 121 Ring 132 12 12 12 12 22 Hierarch. 2 152 8 8 11 32 56 Hierarch. 3 2352 32 40 73 300 496 Fold. clos 6 90 6 6 12 29 30 Fold. clos 12 792 12 12 57 123 132 Abovenet 420 – 24 27 47 56 ATT 2366 – 54 66 168 136 B4 264 – 20 20 25 44 Géant 1991 – 57 70 143 120 NTT 1404 – 42 56 114 104 Sprint 1984 – 52 67 151 124 Tiscali 1740 – 27 37 93 116 Network ∣S∣ min∣T∣ BQ SP1 SP2 DD Full mesh 726 12 12 58 131 121 Ring 132 12 12 12 12 22 Hierarch. 2 152 8 8 11 32 56 Hierarch. 3 2352 32 40 73 300 496 Fold. clos 6 90 6 6 12 29 30 Fold. clos 12 792 12 12 57 123 132 Abovenet 420 – 24 27 47 56 ATT 2366 – 54 66 168 136 B4 264 – 20 20 25 44 Géant 1991 – 57 70 143 120 NTT 1404 – 42 56 114 104 Sprint 1984 – 52 67 151 124 Tiscali 1740 – 27 37 93 116 View Large TABLE 4. Results of the path aggregation algorithms: number of paths to be aggregated ( ∣S∣), minimum number of trees to cover them ( min∣T∣, where ‘–’ stands for unknown), number of trees built by BOUQUET (BQ), number of acyclic graphs computed by the SPAIN algorithms (SP1 and SP2), and number of trees with the strategy divide-by-destination (DD). Network ∣S∣ min∣T∣ BQ SP1 SP2 DD Full mesh 726 12 12 58 131 121 Ring 132 12 12 12 12 22 Hierarch. 2 152 8 8 11 32 56 Hierarch. 3 2352 32 40 73 300 496 Fold. clos 6 90 6 6 12 29 30 Fold. clos 12 792 12 12 57 123 132 Abovenet 420 – 24 27 47 56 ATT 2366 – 54 66 168 136 B4 264 – 20 20 25 44 Géant 1991 – 57 70 143 120 NTT 1404 – 42 56 114 104 Sprint 1984 – 52 67 151 124 Tiscali 1740 – 27 37 93 116 Network ∣S∣ min∣T∣ BQ SP1 SP2 DD Full mesh 726 12 12 58 131 121 Ring 132 12 12 12 12 22 Hierarch. 2 152 8 8 11 32 56 Hierarch. 3 2352 32 40 73 300 496 Fold. clos 6 90 6 6 12 29 30 Fold. clos 12 792 12 12 57 123 132 Abovenet 420 – 24 27 47 56 ATT 2366 – 54 66 168 136 B4 264 – 20 20 25 44 Géant 1991 – 57 70 143 120 NTT 1404 – 42 56 114 104 Sprint 1984 – 52 67 151 124 Tiscali 1740 – 27 37 93 116 View Large Being randomized algorithms, SP1 and the merging phase of SP2 (which works on a random permutation of all subgraphs computed in the first step) were executed several times with the same set of paths. But, instead of defining the exact number of executions (also called iterations) per network, we limited their total execution times to 100 times the time required by BOUQUET for the same set of paths. Table 4 has the minimum number of acyclic subgraphs computed after all iterations. Table 5 presents the execution times of BOUQUET, the thresholds for the total execution time of SP1 and for the total time spent by SP2 in ‘merging iterations’ (as if the first step was instantaneous), and the number of iterations they actually performed during that time. All algorithms were executed in Java in a 3.5 GHz Intel i5 CPU with 8 GB of 1600 MHz DRAM, with the operating system running in single user mode. TABLE 5. Execution times (in seconds) of BOUQUET (BQ), thresholds (in seconds) for the total execution time spent in iterations by the SPAIN algorithms (SP1 and SP2), and the corresponding numbers of iterations performed. Network BQ time (s) Thresh. (s) SP1 # iter. SP2 # iter. Full mesh 0.149 14.9 4186 2741 Ring 0.002 0.2 3844 3645 Hierarch. 2 0.022 2.2 9221 5113 Hierarch. 3 4.762 476.2 26 308 7087 Fold. clos 6 0.008 0.8 3755 3015 Fold. clos 12 0.190 19 2242 2140 Abovenet 0.044 4.4 5390 3357 ATT 1.627 162.7 12 102 6763 B4 0.014 1.4 3327 2629 Géant 0.761 76.1 6461 3671 NTT 0.475 47.5 7207 5019 Sprint 0.757 75.7 7129 3946 Tiscali 1.246 124.6 27 633 14 936 Network BQ time (s) Thresh. (s) SP1 # iter. SP2 # iter. Full mesh 0.149 14.9 4186 2741 Ring 0.002 0.2 3844 3645 Hierarch. 2 0.022 2.2 9221 5113 Hierarch. 3 4.762 476.2 26 308 7087 Fold. clos 6 0.008 0.8 3755 3015 Fold. clos 12 0.190 19 2242 2140 Abovenet 0.044 4.4 5390 3357 ATT 1.627 162.7 12 102 6763 B4 0.014 1.4 3327 2629 Géant 0.761 76.1 6461 3671 NTT 0.475 47.5 7207 5019 Sprint 0.757 75.7 7129 3946 Tiscali 1.246 124.6 27 633 14 936 View Large TABLE 5. Execution times (in seconds) of BOUQUET (BQ), thresholds (in seconds) for the total execution time spent in iterations by the SPAIN algorithms (SP1 and SP2), and the corresponding numbers of iterations performed. Network BQ time (s) Thresh. (s) SP1 # iter. SP2 # iter. Full mesh 0.149 14.9 4186 2741 Ring 0.002 0.2 3844 3645 Hierarch. 2 0.022 2.2 9221 5113 Hierarch. 3 4.762 476.2 26 308 7087 Fold. clos 6 0.008 0.8 3755 3015 Fold. clos 12 0.190 19 2242 2140 Abovenet 0.044 4.4 5390 3357 ATT 1.627 162.7 12 102 6763 B4 0.014 1.4 3327 2629 Géant 0.761 76.1 6461 3671 NTT 0.475 47.5 7207 5019 Sprint 0.757 75.7 7129 3946 Tiscali 1.246 124.6 27 633 14 936 Network BQ time (s) Thresh. (s) SP1 # iter. SP2 # iter. Full mesh 0.149 14.9 4186 2741 Ring 0.002 0.2 3844 3645 Hierarch. 2 0.022 2.2 9221 5113 Hierarch. 3 4.762 476.2 26 308 7087 Fold. clos 6 0.008 0.8 3755 3015 Fold. clos 12 0.190 19 2242 2140 Abovenet 0.044 4.4 5390 3357 ATT 1.627 162.7 12 102 6763 B4 0.014 1.4 3327 2629 Géant 0.761 76.1 6461 3671 NTT 0.475 47.5 7207 5019 Sprint 0.757 75.7 7129 3946 Tiscali 1.246 124.6 27 633 14 936 View Large The tests performed with the synthetic regular networks are significant to assess the absolute quality of the results, because we can compare them with the optimal solutions. In fact, BOUQUET computes an optimal solution for almost all these networks but the hierarchical 3 one, where the result is ~25% far from the minimum. In spite of the allowed running times, SP1 only discovered an optimal set with the ring network. In all other cases it returned a number of graphs between 37.5% (hierarchical 2) and 383% (full mesh) far from the optimum. The results of SP2 are much worse. In what concerns the backbone networks, where optimal solutions are not known, BOUQUET always outperformed SP1, except in the B4 case for which both returned equivalent sets. The results of SP2 are also much worse. Values in the divide-by-destination column are always substantially higher than those obtained with BOUQUET and SP1, which indicates a lack of effectivity in that strategy when the goal is to significantly reduce the size of FIBs in the core of the network. SP2, which follows the same strategy, performed poorly, revealing that its merging phase was not very successful. In the next section, we will discuss how the resulting trees can be used to drive traffic routing in the network. 5. IMPLEMENTING MULTI-PATH ROUTING BASED ON TREES In this section, we present and discuss ways of making use of BOUQUET to implement multi-path routing with off-the-shelf switching or routing equipment in the framework presented in Section 1. That framework is built on several corner stones: Making use of many precomputed paths in the core of the network; Shielding core equipment from dynamic routing as much as possible; Making edge equipment to be responsible for packet flow assignment. This kind of network architecture requires several components with different roles, and an implementation strategy for each one: A logically centralized controller that monitors the network and transforms network management policies into directives to the other components. Its discussion is outside the scope of this paper. See [9, 10, 17, 24] for concrete examples. An implementation of a precomputed data-plane allowing to route packets based on trees in the backbone. This can be implemented by making use of some form of tagging (e.g. using VLAN tagging) or tunnels (e.g. GRE and IP routes). Making use of MPLS, even by recurring to multipoint-to-point LSPs, would require more data-plane entries in switches as has been shown in the previous sections. This is a popular alternative that we will not discuss further. Mechanisms allowing edge routers or switches to direct packet flows to the chosen paths. This is dependent on the way paths are implemented in the backbone. The last two points are highlighted below in two concrete alternatives. 5.1. Routing with trees using VLANs In a network of Ethernet switches, it is possible to statically parametrize in each switch a mapping from ports to VLANs [4]. Each tree, encompassing different paths, is then supported by a different VLAN. With this solution, given the Medium Access Control (MAC) Layer address of the destination, the selection of a path to a destination switch (egress node) is implemented by choosing the corresponding VLAN tag. Routing is then performed using VLAN restricted flooding and filtering on the basis of the destination MAC address. Given the IP address of the destination, it must be mapped to a MAC address, and a path must be chosen to get to the switch where the destination is directly connected. Finally, the chosen path is mapped to a VLAN. Authors of the work described in [4] implemented these mechanisms inside Ethernet drivers of servers in a data centre, by recurring to simple mechanisms to replicate the information required to implement these mappings. To speed up learning specific destination paths in trees, virtual machines, at initialization, periodically, or after any migration, broadcast their localization (e.g. by making use of gracious ARP—Address Resolution Protocol) to each VLAN by which they are reachable. However, the mappings from IP addresses to host MAC addresses, from hosts MAC addresses to switches and from paths to VLANs can be managed by other solutions, including those based on central controllers or other specific software-based solutions, e.g. in the same vein as those presented by A. Greenberg et al. [3]. 5.2. Routing with trees with longest-prefix matching IP routing It is possible to implement multi-path routing by the way of several rooted trees using hierarchical IP address prefixes and longest-prefix matching. For this purpose, it is possible to parametrize routers, using static routes in the following way. Each tree t∈T is associated with an IP prefix I (an interval of addresses whose size is a power of 2). Intervals must be pairwise disjoint. Then, each I is partitioned in as many IP sub prefixes (i.e. subsets or sub intervals) as the root has descendants in this tree, and each sub prefix of I is associated with a different descendant node. This process continues recursively up to the tree leaf nodes. In every node, but the root, the parent prefix is associated with all interfaces leading to the parent node, and each sub IP prefix is associated with all interfaces leading to the descendant associated with its sub prefix. Finally, each node also receives an address in a dedicated sub prefix. Any node will have as many addresses as trees it belongs to [25]. The same applies for the number of static routes associated with its interfaces. Each router may also have an IP address, in a different prefix, routed by a shortest-path protocol to guarantee a direct control channel and a fallback path. The choice of the path a packet should follow is performed by choosing the address of the destination router in the tree encompassing the chosen path. This packet address destination transformation may be performed by the way of IP over IP tunnels or LISP [7, 26] to preserve the original packet header. Static routing in conventional low end routers supports the implementation of a network node by the way of several routers and an edge by the way of several parallel links. By making use of Equal-Cost Multi-Path (ECMP) and automatic route deletion when a link fails, simple localized ways of dealing with many faults are immediately and cheaply available. 5.3. Discussion and related alternatives According to several authors (e.g. [12, 15]) and as already mentioned in Section 1, a vision where, for each different micro flow, state is installed in the core network is not scalable for intra Autonomous Systems (intra-AS) networking or for intra data centre routing. Also, recent proposals related to the usage of an SDN approach in the wide area intra-AS routing (e.g. [9, 10]) rely on the usage of tunnels to avoid (micro) path configurations. Routing with precomputed trees in the core can also be implemented in core switches of the network by making use of OpenFlow to control them. However, as all low end switches support VLANs and static IP routes, their usage in the core seems cheaper and even more effective since these devices cheaply support load balancing across equivalent links (e.g. link aggregation) and routes (e.g. ECMP), and many local faults can be locally dealt without the need of immediate intervention of the central controller. Routing with trees in a backbone, complemented with an SDN approach where dynamic routing reconfigurations are only performed at the edge, seems a scalable way of simplifying the network core and network control in the same vein as has been proposed by Casado et al. [12] or some of us [17]. In addition, when implemented by making use of off-the-shelf and cheap equipment and software popularly available in all sorts of networks, it introduces a novel path for the transition to an SDN controlled network. Other alternatives with the same goal, like Fibbing, proposed by Vissichio et al. [24], and Panopticon, proposed by Levin et al. [27], rely on dynamic reconfiguration of the way the core does routing. The first one by making fake link state announcements by the controller to an OSPF or IS-IS controlled backbone, and the second one by building a hybrid network where OpenFlow controlled parts interconnect other legacy branches of the network. 6. CONCLUSIONS Provider backbone networks need flexible and adaptable mechanisms to support network control with varying demands and promptly adaptation to network faults and other abnormal events. These goals are often achieved using expensive and sophisticated equipments that run distributed algorithms to implement dynamic routing in the core of the network (e.g. MPLS with auto-bandwidth features), as well as a certain degree of links over provisioning. The quest for a simpler support for network control without sacrificing optimality, flexibility and adaptability is the main focus of the SDN approach. However, SDN adoption, to be successful, needs to build a transition path and not cause more harm than benefit. This concern led many to envision a simpler core, able to a priori provide as many paths as needed by the edge to adapt traffic to variable requirements. Unfortunately, the total number of required paths in the core is very important and their continuous availability entails many different FIB entries in core routers/switches, what increases their complexity and cost. To address this problem, we have developed BOUQUET, an algorithm for aggregating paths into trees with better results than the previously proposed algorithms with the same goal. Finally, we have shown how off-the-shelf equipment supporting simple protocols may be used to implement routing with a reduced number of trees, demonstrating that simplicity can be achieved with trivially available protocols and their most common and unsophisticated implementations. ACKNOWLEDGEMENTS We thank the anonymous reviewers for their suggestions which contributed to improve the paper. Footnotes 1 A cycle is a path with at least two nodes, where the first and the last nodes are equal and whose edges are all different. REFERENCES 1 RFC 2328 (Internet Standard), updated by RFCs 5709, 6549, 6845, 6860 ( 1998 ) OSPF Version 2. Internet Engineering Task Force. Fremont, CA. 2 RFC 1142 (Informational) ( 1990 ) OSI IS-IS Intra-domain Routing Protocol. Internet Engineering Task Force. Fremont, CA. 3 Greenberg , A. , Hamilton , J. R. , Jain , N. , Kandula , S. , Kim , C. , Lahiri , P. , Maltz , D. A. , Patel , P. and Sengupta , S. ( 2009 ) VL2: A scalable and flexible data center network . SIGCOMM Comput. Commun. Rev. , 39 , 51 – 62 . Google Scholar Crossref Search ADS 4 Mudigonda , J. , Yalagandula , P. , Al-Fares , M. and Mogul , J. C. ( 2010 ) SPAIN: COTS Data-center Ethernet for Multipathing over Arbitrary Topologies. In Proc. NSDI’10, San Jose, CA, April 28–30, pp. 18–18. USENIX Association, Berkeley, CA. 5 Wang , N. , Ho , K. H. , Pavlou , G. and Howarth , M. ( 2008 ) An overview of routing optimization for internet traffic engineering . IEEE Commun. Surv. Tutorials , 10 , 36 – 56 . Google Scholar Crossref Search ADS 6 RFC 3031 (Informational) ( 2001 ) Multiprotocol Label Switching Architecture. Internet Engineering Task Force. Fremont, CA. 7 RFC 2784 (Proposed Standard), updated by RFC 2890 ( 2000 ) Generic Routing Encapsulation (GRE). Internet Engineering Task Force. Fremont, CA. 8 Stephens , B. , Cox , A. , Felter , W. , Dixon , C. and Carter , J. ( 2012 ) PAST: Scalable Ethernet for Data Centers. In Proc. CoNEXT ‘12, Nice, France, December 10–13, pp. 49–60. ACM, NY. 9 Jain , S. et al. ( 2013 ) B4: Experience with a globally-deployed software defined wan . SIGCOMM Comput. Commun. Rev. , 43 , 3 – 14 . Google Scholar Crossref Search ADS 10 Hong , C.-Y. , Kandula , S. , Mahajan , R. , Zhang , M. , Gill , V. , Nanduri , M. and Wattenhofer , R. ( 2013 ) Achieving high utilization with software-driven WAN . SIGCOMM Comput. Commun. Rev. , 43 , 15 – 26 . Google Scholar Crossref Search ADS 11 Caesar , M. , Casado , M. , Koponen , T. , Rexford , J. and Shenker , S. ( 2010 ) Dynamic route recomputation considered harmful . SIGCOMM Comput. Commun. Rev. , 40 , 66 – 71 . Google Scholar Crossref Search ADS 12 Casado , M. , Koponen , T. , Shenker , S. and Tootoonchian , A. ( 2012 ) Fabric: A Retrospective on Evolving SDN. In Proc. HotSDN ‘12, Helsinki, Finland, August 13–13, pp. 85–90. ACM, NY. 13 Feamster , N. , Rexford , J. and Zegura , E. ( 2013 ) The road to SDN . Queue , 11 , 20:20 – 20:40 . Google Scholar Crossref Search ADS 14 McKeown , N. , Anderson , T. , Balakrishnan , H. , Parulkar , G. , Peterson , L. , Rexford , J. , Shenker , S. and Turner , J. ( 2008 ) OpenFlow: enabling innovation in campus networks . SIGCOMM Comput. Commun. Rev. , 38 , 69 – 74 . Google Scholar Crossref Search ADS 15 Raghavan , B. , Casado , M. , Koponen , T. , Ratnasamy , S. , Ghodsi , A. and Shenker , S. ( 2012 ) Software-Defined Internet Architecture: Decoupling Architecture from Infrastructure. In Proc. HotNets-XI, Redmond, Washington, October 29–30, pp. 43–48. ACM, NY. 16 Suchara , M. , Xu , D. , Doverspike , R. , Johnson , D. and Rexford , J. ( 2011 ) Network architecture for joint failure recovery and traffic engineering . SIGMETRICS Perform. Eval. Rev. , 39 , 97 – 108 . 17 Martins , J. L. and Campos , N. ( 2016 ) Short-sighted routing, or when less is more . IEEE Commun. Magazine , 54 , 82 – 88 . Google Scholar Crossref Search ADS 18 Mudigonda , J. , Yalagandula , P. , Al-Fares , M. and Mogul , J. ( 2009 ) SPAIN: Design and Algorithms for Constructing Large Data-center Ethernets from Commodity Switches. Technical report. HP Labs, Palo Alto, CA. HPL-2009-241. 19 Saito , H. , Miyao , Y. and Yoshida , M. ( 2000 ) Traffic Engineering Using Multiple Multipoint-to-Point LSPs. In Proc. INFOCOM 2000, Tel Aviv, Israel, March 26–30, pp. 894–901 vol.2. IEEE, Los Alamitos, CA. 20 Bhatnagar , S. , Ganguly , S. and Nath , B. ( 2002 ) Label space reduction in multipoint-to-point LSPs for traffic engineering. In Proc. ECUMN 2002, Colmar, France, April 8–10, pp. 29–35. IEEE, Los Alamitos, CA. 21 Mamede , M. , Martins , J. L. and Horta , J. ( 2016 ) Relieving Core Routers from Dynamic Routing with Off-the-Shelf Equipment and Protocols. Technical report. Cornell University Library, Ithaca, NY. http://arxiv.org/abs/1612.07064. 22 Spring , N. , Mahajan , R. and Anderson , T. ( 2003 ) Quantifying the Causes of Path Inflation. In Proc. SIGCOMM ‘03, Karlsruhe, Germany, August 25–29, pp. 113–124. ACM, NY. 23 Heckmann , O. M. ( 2006 ) The Competitive Internet Service Provider: Network Architecture, Interconnection, Traffic Engineering and Network Design . John Wiley & Sons , Chichester, UK . 24 Vissicchio , S. , Tilmans , O. , Vanbever , L. and Rexford , J. ( 2015 ) Central control over distributed routing . SIGCOMM Comput. Commun. Rev. , 45 , 43 – 56 . Google Scholar Crossref Search ADS 25 Tsuchiya , P. F. ( 1991 ) Efficient and robust policy routing using multiple hierarchical addresses . SIGCOMM Comput. Commun. Rev. , 21 , 53 – 65 . Google Scholar Crossref Search ADS 26 RFC 6830 (Experimental) ( 2013 ) The Locator/ID Separation Protocol (LISP). Internet Engineering Task Force. Fremont, CA. 27 Levin , D. , Canini , M. , Schmid , S. , Schaffert , F. and Feldmann , A. ( 2014 ) Panopticon: Reaping the Benefits of Incremental SDN Deployment in Enterprise Networks. In Proc. USENIX ATC 14, Philadelphia, PA, June 17–20, pp. 333–345. USENIX Association, Berkeley, CA. © The British Computer Society 2018. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
The Computer Journal – Oxford University Press
Published: Oct 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”
Daniel C.
“Whoa! It’s like Spotify but for academic articles.”
@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”
@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”
@JoseServera
DeepDyve Freelancer | DeepDyve Pro | |
---|---|---|
Price | FREE | $49/month |
Save searches from | ||
Create lists to | ||
Export lists, citations | ||
Read DeepDyve articles | Abstract access only | Unlimited access to over |
20 pages / month | ||
PDF Discount | 20% off | |
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.
ok to continue