TY - JOUR AU1 - Luo,, Xinjian AU2 - Gao,, Xiaofeng AU3 - Chen,, Guihai AB - Abstract Data centers, receiving increased attention in data management and analysis communities, have posed new challenges in data-intensive applications, among which efficient querying processing holds a critical position. To accelerate the efficiency of multi-dimensional data retrieval, we propose a distributed multi-dimensional indexing scheme for switch-centric data centers in this paper. We first propose FR-Index, a two-layer indexing system integrating both Fat-tree topology and R-tree indexing structure. In the lower layer, each server indexes the local data with R-tree, while in the upper layer the distributed global index depicting an overview of the whole dataset. Based on the Fat-tree topology, we design a specific indexing space partitioning and mapping strategy for efficient global index maintenance and query processing. Furthermore, we develop a cost model to dynamically update FR-Index. Experiments on Amazon’s EC2 platform, comparing FR-Index with RT-CAN and RB-Index, show that the proposed indexing schema is scalable, efficient and lightweight, which can significantly promote the efficiency of query processing in data centers. 1. INTRODUCTION Massive storage systems have received increasing attention in both academia and industry nowadays. Various distributed storage systems, such as GFS [1], Cassandra [2], Dynamo [3], were designed to satisfy the increasing requirements of data-intensive applications. An attractive challenge for large-scale distributed storage systems is how to retrieve specified data from massive dataset efficiently. Empirically, designing appropriate and effective index is a typical solution for this challenge. Refs [4–6] designed distributed indexing schemes in P2P systems. Each of them deploys a P2P topology on the distributed server cluster as an overlay for data mapping and routing queries. However, the underlying topologies of P2P networks are actually logically defined, which means the nodes are scattered geographically and the connections between nodes are unstable, leading to unreliable services [7]. In recent years, as a type of infrastructure, Data Centers are playing increasingly vital role in cloud services. In data centers, a great number of servers are organized into a specific topology by data center network (DCN). For example, Cisco employs fat-tree topology to support efficient communications in its data centers [8]. With the rapid development of data centers, some critical issues for performance optimization have been proposed. One of them is to construct efficient indexing schemes for storage systems deployed on data centers. Different from the logical topologies in P2P networks, however, DCN defines specific physical topologies, connecting nodes with a strict manner. Thus, it is impractical to simply transplant indexing schemes in P2P networks to data centers. Gao et al. [7, 9–11] have proposed some network-aware indexing techniques, which are typically two-layer indexing schemes. In two-layer indexing framework, each server indexes its data via a local index structure, such as R-tree. Then some index nodes in each server are selected and published among the cluster as global index. Although those works presented excellent proposals, they have not considered the multi-dimensional indexing in switch-centric DCN’s. In fact, the multi-dimensional indexing strategies are widely adopted by many commercial applications. A simple application is the photo search via the metadata tags. A photo object could be specifically expressed as {t1,t2,…,tn} ⁠, where ti denotes its features and geographic information, such as latitude, longitude, topic, color, size, etc. Typical queries could be searching photos in a specific location or with specific features. To efficiently support these queries, a multi-dimensional indexing strategy designed for data centers is critical. This paper targets the problem of constructing distributed multi-dimensional two-layer indexing scheme on the Fat-tree topology of switch-centric DCN. There are two main challenges to construct the multi-dimensional indexing scheme in Fat-tree: first, how to partition and map the index space, a hypercube, to the servers interconnected with tree-like topology; and second, how to publish and update the global index distributed among the cluster. We designed FR-Index, a two-layer indexing system fully taking advantage of the Fat-Tree topology and R-tree indexing technology. In FR-Index, each node plays two roles, i.e. storage node and overlay node. In the lower layer, each server, acting as storage node, stores the multi-dimensional data and indexes them with R-tree structure. In the upper layer, the indexing space of the entire datasets is partitioned and designated to each server, and the global index is selected from each servers’ local index and scattered among servers based on their overlay position. The distributed global index depicts an overview of the whole dataset. To partition and map the multi-dimensional indexing space into Fat-tree, we selectively reduce and divide the index dimensionality. Efficient query processing methods are proposed to accelerate data retrieval based on this indexing system. A cost model based on two-state Markov chain model and Fat-tree routing protocol are proposed to dynamically update FR-Index. We also conducted extensive experiments on Amazon EC2 platform to evaluate the performance of our proposal. Comparisons between FR-Index and RT-CAN/RB-Index illustrate the availability and efficiency of FR-Index. To summarize, the contributions of this paper are concluded as follows: We propose a distributed two-layer indexing scheme, FR-Index, for efficient data retrieval in switch-centric data centers with tree-like topology. FR-Index fully takes advantage of the Fat-tree routing protocol and R-tree structure to accelerate data retrieval. We design a novel indexing space partitioning and mapping strategy to publish the global index among the cluster. We propose a cost model based on Markov model and Fat-tree routing protocol to dynamically maintain and update FR-Index. We conduct experiments on Amazon EC2 platform to validate the performance of FR-Index. Comparisons between our scheme and RT-CAN/RB-Index exhibit the efficiency and availability of FR-Index. The remainder of this paper is organized as follows. In Section 2, we introduce some related work. The architecture of FR-Index system is discussed in Section 3, followed by the query processing strategy in Section 4. Section 5 discussed the index maintenance and updating in FR-Index. By comparing with RT-CAN and RB-Index, we evaluated the performance of FR-Index in Section 6. In the end, we summarized our work in Section 7. 2. RELATED WORK In this section, we first introduce the Fat-tree topology in DCNs, then introduce the distributed two-layer indexing framework together with two critical design criteria: index completeness and uniqueness. 2.1. Fat-tree topology As the core infrastructure in cloud systems, data centers employ switches and high-speed links to connect a great deal of servers, among which DCN play an important role in communications and computing. Compared to traditional P2P networks, DCN structures are designed to satisfy higher requirements such as high scalability, high availability, energy-saving and robustness. In [12], the DCN topologies are divided into three categories based on the structural features: server-centric, switch-centric and enhanced architectures. Switch-centric architecture, which could be further divided into flat, tree-like and unstructured topologies, is a main category in DCN architectures. In this type of DCN, the switches are enhanced for networking and routing requirements, while the functions of servers remain unmodified. In server-centric DCN, however, low-end switches are simply used for message forwarding, while servers are used for networking and routing. Since servers are programmable and powerful, more intellective topologies (usually recursively defined) are designed for server-centric DCNs. As for the enhanced architectures, wireless or optical devices are generally employed for capacity enhancement. As a subclass of switch-centric DCN’s, tree-like DCN’s connect devices by links similar to a multi-rooted tree. Switches are divided into multiple layers, and servers are linked to the bottom-layer switches. Most tree-like DCN’s tend to divide lower layer switches and servers into some substructures, like pod in Fat-tree. A k-pod Fat-tree consists of three layers of k-port switches. In each pod, the k/2 aggregation layer switches and the k/2 edge layer switches interconnect as a complete bipartite graph. Every switch in aggregation layer connects to k/2 switches in core layer. Every switch in the edge layer connects to k/2 servers. Thus, a k-pod Fat-tree can support connecting k3/4 servers. Ref. [8] introduces different IP addressing rules for switches and servers. For pod switches, the form 10.pod.swi.1 acts as their IP addresses, where pod∈[0,k−1] denotes the pod number, and swi denotes the position of the switch in the pod (in [0,k−1] ⁠, starting from left to right, bottom to top). The address of a server is 10.pod.swi.ID where pod and swi follow the address of the edge switch which the server connects, and ID (in [2,k/2+1] ⁠, starting from left to right) denotes the server’s position in that subnet. Figure 1 illustrates a fat-tree topology with four pods and examples of the addressing scheme. Because of its simple architecture, high availability and strong robustness, fat-tree has been employed in many enterprises’ data centers such as Cisco [12]. FIGURE 1. View largeDownload slide A fat-tree topology with four pods. FIGURE 1. View largeDownload slide A fat-tree topology with four pods. 2.2. Distributed indexing systems 2.2.1. Related multi-dimensional indexing schemes The performance of data retrieval in traditional cloud systems relies heavily on parallelism, i.e. parallel scanning all the data. Such methods are simple but inefficient and could easily incur immense network traffic. In recent years, many studies have been conducted to design efficient indexing schemes for distributed systems. For example, Sioutas et al. designed an ART+ structure [21] to support efficient range queries on large-scale, decentralized environments. The outer level of ART+ is an dynamic and fault-tolerant ART structure [22], and the cluster-node in ART+ is organized into a D3-tree [23], a enhanced dynamic version of D2-Tree [24]. ART+ is highly fault-tolerant and can improve the overall query efficiency meanwhile achieving load-balancing. Kokotinis et al. [25] proposed a NSM-tree index scheme based on M-tree and MapReduce framework. M-tree [26] works well for efficient range and kNN queries in many conditions. While its structure is similar to R-Tree, M-Tree could have large overlap areas and this problem is still hard to eliminate for now. Space-filling curve is another popular technique that can map multi-dimensional data into 1D space. Multi-dimensional indexing schemes based on space-filling curve, such as Squid [27], CISS [28] and SCRAP [29], have been well studied. These schemes can achieve good locality and load balancing in low-dimensional scenarios, but query locality can become very poor for even three dimensions because of the curse of dimensionality [29]. In addition, schemes of space-filling curve are generally built on P2P overlay networks. In data centers storing large-scale data items, indexing schemes with space-filling curves could cause query congestion because of its feature of dimensionality reduction [28]. In FR-Index, we adopt R-tree as the underlying indexing structure, given that R-tree is generally used in many multi-dimensional applications, offering superior performance and flexibility. 2.2.2. Distributed two-layer indexing schemes To reduce the network traffic and index maintenance overhead, Wu and Wu [6] proposed a two-layer indexing framework based on overlay network, shown in Fig. 2, for efficiently data retrieval in cloud systems. FIGURE 2. View largeDownload slide The distributed two-layer framework. FIGURE 2. View largeDownload slide The distributed two-layer framework. In the indexing framework, each server in cloud system plays two roles, namely, overlay node and storage node. To build a distributed two-layer indexing system, each server first builds a local index for its data; then, some index entries in each server are selected and disseminated in cluster as the global index. To process a query, the cluster first searches global index to find a responsible server S ⁠, and then forwards the query to S ⁠. Whereafter, data retrieval is processed through the local index of S and the results are returned to user. The framework supports most of existing index structures and is compatible with many distributed file systems like Dynamo [3]. Some recent indexing schemes based on this framework are presented in Table 1. TABLE 1. Comparison between FR-Index and related two-layer indexing studies. Year Scheme Index structure Data dimension Overlay Network type 2010 CG-Index [5] B+-tree 1 BATON [13] P2P 2010 RT-CAN [14] R-tree >1 CAN [15] P2P 2011 QT-Chord [16] quad-tree >1 Chord [17] P2P 2011 TLB-Index [18] bitmap 1 BATON [13] P2P 2015 RB-Index [9] R-tree >1 BCube [19] DCN 2015 FT-Index [10] B+-tree 1 Fat-tree [8] DCN 2015 U2-Tree [11] B+-tree 1 Tree-like Topologies DCN 2016 RT-HCN [7] R-tree >1 HCN [20] DCN 2018 FR-Index R-tree >1 Fat-tree [8] DCN Year Scheme Index structure Data dimension Overlay Network type 2010 CG-Index [5] B+-tree 1 BATON [13] P2P 2010 RT-CAN [14] R-tree >1 CAN [15] P2P 2011 QT-Chord [16] quad-tree >1 Chord [17] P2P 2011 TLB-Index [18] bitmap 1 BATON [13] P2P 2015 RB-Index [9] R-tree >1 BCube [19] DCN 2015 FT-Index [10] B+-tree 1 Fat-tree [8] DCN 2015 U2-Tree [11] B+-tree 1 Tree-like Topologies DCN 2016 RT-HCN [7] R-tree >1 HCN [20] DCN 2018 FR-Index R-tree >1 Fat-tree [8] DCN View Large TABLE 1. Comparison between FR-Index and related two-layer indexing studies. Year Scheme Index structure Data dimension Overlay Network type 2010 CG-Index [5] B+-tree 1 BATON [13] P2P 2010 RT-CAN [14] R-tree >1 CAN [15] P2P 2011 QT-Chord [16] quad-tree >1 Chord [17] P2P 2011 TLB-Index [18] bitmap 1 BATON [13] P2P 2015 RB-Index [9] R-tree >1 BCube [19] DCN 2015 FT-Index [10] B+-tree 1 Fat-tree [8] DCN 2015 U2-Tree [11] B+-tree 1 Tree-like Topologies DCN 2016 RT-HCN [7] R-tree >1 HCN [20] DCN 2018 FR-Index R-tree >1 Fat-tree [8] DCN Year Scheme Index structure Data dimension Overlay Network type 2010 CG-Index [5] B+-tree 1 BATON [13] P2P 2010 RT-CAN [14] R-tree >1 CAN [15] P2P 2011 QT-Chord [16] quad-tree >1 Chord [17] P2P 2011 TLB-Index [18] bitmap 1 BATON [13] P2P 2015 RB-Index [9] R-tree >1 BCube [19] DCN 2015 FT-Index [10] B+-tree 1 Fat-tree [8] DCN 2015 U2-Tree [11] B+-tree 1 Tree-like Topologies DCN 2016 RT-HCN [7] R-tree >1 HCN [20] DCN 2018 FR-Index R-tree >1 Fat-tree [8] DCN View Large One key observation of Table 1 is that roughly before 2015, most indexing schemes concentrated on P2P networks, which is not surprising since the two-layer indexing framework mentioned before was proposed based on the P2P overlay networks. After 2015, however, designing efficient indexing schemes in DCN became an important issue. Hong et al. [7] and Gao et al. [9] proposed different two-layer distributed indexing schemes for HCN (RT-HCN) and Bcube (RB-Index), respectively. Though FR-Index, RT-HCN and RB-Index are all two-layer schemes, there exist significant differences between FR-Index and the other two schemes. Hierarchical irregular compound network (HCN) [20] is a recursively defined architecture based on n-port switches and dual-port servers. Hong et al. [7] designed 2D RT-HCN based on HCN(4,n) and provided a perspective of indexing extension for data with three or more dimensions. However, since HCN(4,n) is a typical 2D overlay network, the perspective for indexing extension seems too complicated and impractical. For FR-Index, however, datasets with any number of dimensions can be supported, which is more scalable and practical. RB-Index [9] is another two-layer indexing scheme designed for Bcube, where a Bcubek is constructed by connecting n lower level Bcubek−1 with nk n-port switches. In RB-Index, the index partitioning scheme constructed in Bcubek can support datasets with up to k+1 dimensions. An RB-Index system is composed of n+1 indexing spaces, where one (k+1)-dimensional space is for Bcubek and n k-dimensional spaces are for n Bcubek−1 s. Such architecture can make RB-Index support up to ((n+1)k+1)-dimensional data. However, if the number of data dimensions is less than k ⁠, extra zero vector will be appended to the data, leading to plenty of extra storage cost; if the number of data dimensions is more than ((n+1)k+1) ⁠, some dimensions will not be indexed, leading to a full cluster scan which could greatly degrade the entire system performance. In FR-Index, we employ a set of indexing instances, with each instance in charge of three dimensions, to cover all the data dimensions, which is not only more flexible but also more efficient. Comparisons between RB-Index and FR-Index will be discussed in Section 6.4. 2.2.3. Index completeness and uniqueness In FR-Index, when selecting a portion of local index nodes as the global index, completeness and uniqueness must be guaranteed [14]. Definition 1 (Index uniqueness) Suppose server Niselects a set of index nodes Slgfrom its local index Slto publish as the global index. To satisfy the index completeness, if and only if for any data item distored in server Ni ⁠, diis contained in one of the nodes of Slg ⁠. Definition 2 (Index uniqueness) Suppose server Niselects a set of index nodes Slgfrom its local index Slto publish as the global index. To satisfy the index uniqueness, if and only if for any index node ni∈Sland its ancestor node nj∈Sl ⁠, ni∈Slg→nj∉Slg∧nj∈Slg→ni∉Slg ⁠. Take Fig. 3 for example. Suppose the tree in the left part, say, a R-tree, is the local index Sl of server Ni ⁠. A set of nodes in Sl needed to be selected and published in the cluster as global index. To satisfy the index completeness and uniqueness, we selected these shaded nodes with red border (nodes {b,g,h,i} ⁠) in the right tree as the global index. FIGURE 3. View largeDownload slide Index completeness and uniqueness. FIGURE 3. View largeDownload slide Index completeness and uniqueness. 3. FR-INDEX All servers in the data center participate in constructing and maintaining the indexing instances of FR-Index. In addition, we set an individual server as a historical data collector (called Collector for short) of FR-Index. The collector will collect some historical data as the basis of some decisions we made. An FR-Index system is composed of a set of index instances denoted by I={I1,I2,…,Iw} ⁠. An index instance Ii indexes an ‘indexing space’ denoted by Ii.space ⁠, which is composed of several selected dimensions of the dataset. The FR-Index collector generates Ii.space and informs all servers of Ii.space ⁠. Each server builds a local R-tree to index its local data on the dimensions contained by Ii.space ⁠. Then each server publishes a portion of its local R-tree index nodes among the cluster to compose the global index. Thus, we get distributed global index and each server maintains a portion of the global index. An index instance Ii can be regarded as a combination of all local indexes and global index which are built on Ii.space ⁠. Suppose that a dataset is composed of n attributes. Every item in the dataset can be regarded as an object in a n-dimensional space which can be denoted by D={d0,…,dn−1} ⁠. Then we define that Ii.space⊆D ⁠. Additionally, a multi-dimensional query is denoted as q(Ctr) ⁠, where Ctr={ctr1,…,ctru} is a set of query criteria on u dimensions. We take a set Qd={qd1,…,qdu} to represent the u dimensions. Obviously, Qd⊆D ⁠. To help understand these symbols, we take a SQL query q for example. Assume the query conditions of q is ‘ whereNAME=‘Jack’andGENDER=‘male’andNo=‘123’’, then the query criteria of q is Ctr={NAME=‘Jack’,GENDER=‘male’,NO=‘123’} ⁠, and the corresponding query dimensions of q is Qd={NAME,GENDER,NO} ⁠. The symbols and notations used in this paper are summarized in Table 2, some of which will be defined in the following sections. TABLE 2. Notations and symbols. Term Definition Ii The ith index instance in FR-Index system Ii.space The indexing space of index instance Ii n The number of dimensions of datasets D The dimensions of datasets: D={d0,…,dn−1} B The bounding box covering all the spatial data objects: B=(b0,…,bn−1) bi The ith interval of B ⁠: bi=[li,ui] q(Ctr) The multi-dimensional query with query criteria Ctr Ctr The query criteria for a specific query: Ctr={ctri,…,ctru} Qd The dimensions of Ctr ⁠: Qd={qd1,…,qdu} St The tth server pirt The potential indexing range of St INt The node set selected from St as global index: INt={int1,…,intk} Term Definition Ii The ith index instance in FR-Index system Ii.space The indexing space of index instance Ii n The number of dimensions of datasets D The dimensions of datasets: D={d0,…,dn−1} B The bounding box covering all the spatial data objects: B=(b0,…,bn−1) bi The ith interval of B ⁠: bi=[li,ui] q(Ctr) The multi-dimensional query with query criteria Ctr Ctr The query criteria for a specific query: Ctr={ctri,…,ctru} Qd The dimensions of Ctr ⁠: Qd={qd1,…,qdu} St The tth server pirt The potential indexing range of St INt The node set selected from St as global index: INt={int1,…,intk} View Large TABLE 2. Notations and symbols. Term Definition Ii The ith index instance in FR-Index system Ii.space The indexing space of index instance Ii n The number of dimensions of datasets D The dimensions of datasets: D={d0,…,dn−1} B The bounding box covering all the spatial data objects: B=(b0,…,bn−1) bi The ith interval of B ⁠: bi=[li,ui] q(Ctr) The multi-dimensional query with query criteria Ctr Ctr The query criteria for a specific query: Ctr={ctri,…,ctru} Qd The dimensions of Ctr ⁠: Qd={qd1,…,qdu} St The tth server pirt The potential indexing range of St INt The node set selected from St as global index: INt={int1,…,intk} Term Definition Ii The ith index instance in FR-Index system Ii.space The indexing space of index instance Ii n The number of dimensions of datasets D The dimensions of datasets: D={d0,…,dn−1} B The bounding box covering all the spatial data objects: B=(b0,…,bn−1) bi The ith interval of B ⁠: bi=[li,ui] q(Ctr) The multi-dimensional query with query criteria Ctr Ctr The query criteria for a specific query: Ctr={ctri,…,ctru} Qd The dimensions of Ctr ⁠: Qd={qd1,…,qdu} St The tth server pirt The potential indexing range of St INt The node set selected from St as global index: INt={int1,…,intk} View Large 3.1. Selecting indexing dimensions In most cases, a query would not cover many dimensions, which means that for a query q(Ctr) ⁠, the cardinality of the corresponding query dimension Qd would not be too large. Thus, it is necessary to reduce the dimensions of the proposed multi-dimensional index. However, a single index which is built on a few dimensions might not facilitate processing all queries. To better manage the indexing system, we build a set of index instances I and set each ∣Ii.space∣ as a fixed value in FR-index. Note that different from most recursively defined server-centric topologies, the architecture of Fat-tree strictly consists of three layers, i.e. core layer, aggregation layer and edge layer (see Fig. 1). Benefiting from the hierarchical architecture of Fat-tree, a 3D hypercube (indexing space) could be appropriately partitioned and designated to servers of edge layer, regardless of the order k of Fat-tree. We will show how to determine a set of indexing spaces with three dimensions in FR-Index. FR-index Collector collects query samples by requesting servers for their logs. To accelerate the process of log collecting, an optional method is stratified random sampling. For example, we can only send log requests to some servers randomly selected, where the sample quantity can be customized or self-tuned. The collector then analyses those query samples to make decisions about indexing spaces: Collector traverses all query samples and extracts each query’s dimensions Qd ⁠. A histogram is maintained to record the occurrence frequency Pi of every different Qdi ⁠. Then, all different Qd’s are sorted by its occurrence frequency in descending order, denoted as a collection Q={Qd1,Qd2,…,Qdm} ⁠. Collector selects the first x sets in Q by calculating an integer x which satisfies ∑j=1xPj≥Pthr ⁠, where Pthr∈[0,1] is a threshold to control the performance of FR-Index. Generally, a higher Pthr may incur more index instances accompanied by more maintaining costs and faster query processing. Now Q is pruned into Qp={Qd1,Qd2,…,Qdx} ⁠. We regard that Qp depicts the feature of Pthr×100% of all historical queries as well as all subsequent queries. Based on Qp ⁠, Collector finds a collection Dans={Dc1,Dc2,…,Dcy} which has the following three properties: ∀i∈{1,2,…,y} ⁠, Dci⊆D and ∣Dci∣=3 ⁠. ∀j∈{1,2,…,x} ⁠, if ∣Qdj∣<3 ⁠, ∃i∈{1,2,…,y} ⁠, such that Qdj⊆Dci ⁠. ∀j∈{1,2,…,x} ⁠, if ∣Qdj∣≥3 ⁠, ∃i∈{1,2,…,y} ⁠, such that Dci⊆Qdj ⁠. Each set in Dans will become a 3D indexing space on which an index instance will be built. All of these index instances constitute an FR-Index system. Since Qp depicts the feature of Pthr×100% of all historical queries, properties (b)(c) of Dans guarantee that our FR-Index can efficiently facilitate processing Pthr×100% of all subsequent queries. 3.2. Partitioning indexing space The information of selected indexing spaces will be sent to all servers by Collector. Once a server received the information, it will build local R-tree index on the dimensions contained by the indexing space. To better illustrate our proposal, we will take one index instance as an example to show that how our system works. As mentioned above, a server needs to maintain a portion of global index. A challenge is to determine the range of global index that the server should be responsible for. In the following discussion, we denote this range as potential indexing range (PIR). As a tree-like DCN, the hierarchical structure of Fat-tree offers a convenient and efficient way to partition the indexing space such that we can generate PIR for every server. In the entire data center, all multi-dimensional data form a data boundary denoted as B=(b0,b1,b2,…,bn−1) ⁠, i.e. a n-dimensional hypercube. Here each bi is a closed bounded interval [li,ui] describing a range which is covered by the data objects along dimension di ⁠. Suppose that we had chosen an indexing space Ij.space=(d0,d1,d2) ⁠. Since B′=(b0,b1,b2) is the ‘meaningful’ subspace of Ij.space for our work, we will consider Ij.space=B′=(b0,b1,b2) in the following discussion for simplicity. In a k-pod Fat-tree, we code a server St by t=(k/2)2pod+(k/2)swi+(ID−2) ⁠, where pod ⁠, swi and ID are parameters in the IP address of this server. Intuitively, we first partition the indexing space into k (the number of pods) parts along the first dimension, then k/2 (the number of edge switches in each pod) parts along the second dimension, finally k/2 (the number of servers connecting with each edge server) parts along the third dimension. Now, we gain k3/4 equal-sized partitions of the indexing space and map each partition to a server. Therefore, for server St ⁠, its PIR denoted by pirt can be expressed by Equation (1). pirt={[l0t,u0t],[l1t,u1t],[l2t,u2t]}=l0+pod(u0−l0)k,l0+(pod+1)(u0−l0)k,l1+switch(u1−l1)k/2,l1+(switch+1)(u1−l1)k/2,l2+(id−2)(u2−l2)k/2,l2+(id−1)(u2−l2)k/2 (1) Figure 4 shows an example for the index space partitioning strategy in a 4-pod FAT-tree. FIGURE 4. View largeDownload slide An example of index space partitioning in a 4-pod FAT-tree. The space was split along D0 dimension first, then split along D1 and D2 in sequence. FIGURE 4. View largeDownload slide An example of index space partitioning in a 4-pod FAT-tree. The space was split along D0 dimension first, then split along D1 and D2 in sequence. 3.3. Publishing in FR-index To build global index, a server, say, St ⁠, adaptively selects a set of R-tree nodes, INt={int1,…,intn} ⁠, from its local index and publishes them as the global index. The index nodes in INt should cover all data stored in St ⁠, i.e. satisfying the index completeness in Section 2.2.3. Each inti in INt will be published as a tuple of (ipt,mbri) ⁠, where ipt is the IP address of St and mbri is the bounding box of inti ⁠. 3.3.1. Mapping scheme Wang et al. [4] proposed a novel mapping schema to regulate the publishing process. We adapt this schema for our system to distribute global index among the servers in Fat-tree. For an R-tree node inti in St to be published, we take the center inti.c and radius inti.r of its bounding box as the criteria for mapping. For example, assume the bounding box of inti is {[l0,u0],[l1,u1],[l2,u2]} ⁠, then inti.c=(l0+u02,l1+u12,l2+u22) ⁠, and inti.r=12(u0−l0)2+(u1−l1)2+(u2−l2)2 ⁠. Furthermore, suppose inti.c=(inti.c0,inti.c1,inti.c2) ⁠, where inti.cj (j=0,1,2) denotes the coordinate value on the jth dimension of inti.c ⁠. It is obvious that inti.c will be contained in a certain server’s PIR, i.e. the designated indexing subspace. According to the indexing space partitioning scheme in Section 3.2, we can calculate this server’s ip address 10.pod.switch.id by inti.c ⁠: pod=⌊k(inti.c0−l0)/(u0−l0)⌋switch=⌊k(inti.c1−l1)/2(u1−l1)⌋id=⌊k(inti.c2−l2)/2(u2−l2)⌋+2 (2) For simplicity, we call this server the initial server, Sx ⁠. To publish inti as the global index, St first sends inti to Sx ⁠. Then Sx compares inti.r with a predefined threshold, say Rthr ⁠. If inti.r>Rthr ⁠, then Sx determines the servers (called candidate servers) whose PIR intersects with the bounding box mbri of inti ⁠, and inti will be further sent to those servers as global index. If inti.rRthr ⁠, denote the overlapping region between inxi and range as Regiono ⁠. There exists a server Sy whose piry overlaps with Regiono ⁠, and piry satisfies the following two conditions simultaneously: piry overlaps with inxi ⁠. According to the global index mapping strategy in Section 3.3, server Sy with piry has stored inxi as the global index node. piry overlaps with range ⁠. Thus piry must overlap with range.sspace ⁠, which means q(range) will be forwarded to Sy ⁠. In summary, Sy ⁠, storing inxi like St2 ⁠, will be searched and the results contained in inxi will be retrieved. □ In general, the range query strategy in FR-Index can guarantee the completeness of query results. 4.3. Query on skewed datasets In this section, we discuss the query processing of skewed dataset, corresponding to the skewed data mapping scheme in Section 3.4. For a point query q(key) with key=(v1,v2,v3) ⁠, we perform the following steps: The server Sx receiving q(key) first check if the indexing instance Iino matching q(key) is constructed on skewed dataset. If so, Sx will transform key to key′=(v1′,v2′,v3′) ⁠: vj′=cdfj(vj)×∣bj∣+lj(j=1,2,3), (9) where bj=[lj,uj] denotes the data range on dimension dj ⁠. According to indexing space partitioning scheme, Sx determines the server Sinit whose PIR pirinit contains key′ and forwards a message 〈q(key),key′,ino〉 to Sinit ⁠. Sinit first generates a hypersphere at point key′ with radius Rthr and then forwards 〈q(key),key′,ino〉 to the candidate servers. Sinit and candidate servers search key in global nodes and forward 〈q(key),ino〉 to servers that published the global nodes containing q(key) ⁠. The final results of q(key) will be fetched from the local indexes of these servers. Similarly, for a range query q(range) ⁠, we first transform range to range′ ⁠, then perform the similar query steps in Section 4.2. 4.4. Discussion on Rthr Up to now, we can discuss the impact of Rthr in FR-Index system. According to Section 3.3, a smaller Rthr incurs more index node replicas in multiple servers, which increases the maintenance cost. On the other hand, according to the query processing strategy, a larger Rthr implies a larger search space, which means we must search more servers to retrieve complete results for a query, reducing the efficiency of query processing. In FR-Index, we calculate a specific Rthr for an index instance. Collector is used to sample R-tree nodes in local index. After several trials of FR-Index implementation, we found that when the Rthr in an index instance is slightly larger than the average radius of 70% sample nodes, the trade-off between query efficiency and index maintenance would be optimal. 5. INDEX MAINTENANCE AND UPDATING In reality, query pattern to the cluster could significantly alter in some scenarios, especially in data-intensive applications. Therefore, dynamic index updating is a crucial component in FR-Index. The update of FR-Index is classified into two categories: Update in index instances: at first, we have no knowledge about the query pattern and data updating pattern. Therefore, in an h-level local R-tree, we publish the index nodes in h−1 level (the h level nodes is leaf nodes). After a period of time, we adopt a cost model to update some published index nodes to reduce the overhead of index maintenance. More details will be discussed in Section 5.1. Update to index instances: another updating requirement for our system is to update the index instances we have constructed. We propose a simple and efficient strategy to deal with this requirement. Each server stores a histogram to maintain the accessing status for every index instance. The FR-Index collector will first request the histograms from all the servers, then adopt least recently used algorithm to delete obsolete index instances and add new index instances, which will be discussed in Section 5.2 further. 5.1. Update in index instance As mentioned before, each server St in the cluster needs to select a portion of local index to publish as the global index. However, which local index nodes should be selected is a critical question. In this section, we introduce a cost model for nodes selection and index dynamic update. 5.1.1. Cost model Suppose a server St selects an index node set INt={int1,…,intk} from its local R-tree as the global index, satisfying index completeness and uniqueness. In FR-Index, the cost C(in) of index node in is composed of index maintenance cost Cm(in) and query processing cost Cq(in) ⁠: C(in)=Cm(in)+Cq(in). (10) Index maintenance cost Cm(in) represents the cost of index node update. If a published local index node is split or merged locally because of data update, the corresponding global index node needs to be updated as well. When a published local index node is split, the server storing this node should publish three index update messages, i.e. one message for deleting obsolete node and two messages for publishing two new nodes. Similarly, if two published local index nodes are merged, two messages for deleting two obsolete nodes and one message for publishing the new node need to be published. For a node in ⁠, suppose the probabilities of splitting and merging in are psplit(in) and pmerge(in) ⁠, respectively, then: Cm(in)=3(psplit(in)+pmerge(in)). (11) Now the values of psplit(in) and pmerge(in) need to be specified. A two-state Markov chain model in [14] was used to calculate the probabilities of splitting or merging each R-tree node. To improve precision, this method needs a histogram for recording historical update pattern and considerable calculation for update prediction. In FR-Index, the method is employed to calculate psplit(in) and pmerge(in) of leaf nodes in R-tree. As for non-leaf nodes, we employ the method in [16] for probability calculation. Suppose the probability of inserting a key to node in is p1(in) ⁠, and the probability of deleting a key from in is p2(in) ⁠, then we have: psplit=p2p13m2−p2p1mp2p12m−p2p1m, (12) pmerge=p2p12m−p2p13m2p2p12m−p2p1m, (13) where m is an R-tree parameter indicating the minimum number of subtrees of a parent node. For any non-leaf node in ⁠, suppose its child node set is {c1,c2,…,ci} ⁠, then p1 and p2 can be calculated: p1(in)=∏j=1i(1−psplit(cj)), (14) p2(in)=∏j=1i(1−pmerge(cj)). (15) According to Equations (12)–(15), if we obtain psplit and pmerge of leaf nodes by Markov model, the probability of non-leaf nodes update can be calculated subsequently. Query processing cost Cq(in) consists of an essential cost and a false positive cost pfp ⁠. In FR-Index, we are mainly focused on pfp since the essential cost cannot be further reduced. To better understand a false positive query, suppose two published global R-tree nodes inig1 and injg2 have an intersect region regisc ⁠, and corresponding local index nodes inil1 and injl2 are stored in server Si and Sj ⁠, respectively. Suppose a point key for query q(key) is covered by regisc ⁠, but key is actually stored in Si instead of Sj ⁠. When a user sends q(key) to FR-Index, this query will be forwarded to Si and Sj simultaneously, thus the query to Sj is a false positive query. Suppose B(in) is the bounding box of node in ⁠, and D(in) represents the actual data range of in ⁠, then the false positive probability pfp is: pfp=B(in)/D(in)B(in). (16) And we have Cq(in)=pfp(in) ⁠. After obtaining the index maintenance cost Cm(in) and query processing cost Cq(in) ⁠, we can calculate the cost of node in ⁠: C(in)=3psplit(in)+3pmerge(in)+pfp(in) (17) In addition, according to the index mapping and publishing strategy in Section 3, an index node in could be forwarded to multiple servers as the global index when its radius in.r>Rthr ⁠, which means the communication cost for index update to in could be significantly influenced by the number of servers storing in ⁠. We further optimize the cost model: C(in)=ins+1ins[3psplit(in)+3pmerge(in)+pfp(in)],in.r>Rthr3psplit(in)+3pmerge(in)+pfp(in),otherwise (18) where ins is the number of PIR’s which intersect with the bounding box of in ⁠. Notice that the cost model (Equation (18)) is based on the premise that the switches in fat-tree are two-layer devices, which means data packages can be forwarded in linear speed. We define the routing cost for forwarding one package in two-layer switches is 1. If the switches in data center are three-layer switches, the store-and-forward function could take effect. We define the routing cost of one package forwarding in three-layer switches as the number of hops passed by this package. A k-pod Fat-tree data center has k3/4 servers. For any server S ⁠, the hops needed during communications between S and other servers are probably 1, 3 or 5. Suppose the probabilities that S communicates with any other servers are equal, then we get the expected hops: E(k)=(k/2−1)+3(k/2−1)k/2+5(k−1)k2/4k3/4=5−2k−4k2−4k3. (19) Therefore, when three-layer switches are employed in fat-tree, the cost of indexing node in is: C(in)′=E(k)C(in). (20) 5.1.2. Indexing nodes selection and update In our proposal, each server chooses some index nodes from local R-tree index and publishes them into global index. A published higher-level R-tree node may incur less update cost while generate more false positives. Besides, its bounding box may overlap with more servers’ PIRs, increasing storage cost and query processing complexity. Therefore, it is crucial to choose ‘proper’ local index nodes to publish. According to Equation (17), if a global index node is updated frequently but scarcely queried, it has relatively greater maintenance cost and should be replaced by its parent node; if a node is queried frequently but scarcely updated, then it has great query processing cost and should be replaced by its child nodes. The indexing node set, selected from server St and published as global index, should not only satisfy index completeness and uniqueness but also has a minimum cost summation. Ref. [14] introduces a dynamic programming algorithm. We adapted this algorithm for index node selection in FR-Index, which is shown in Algorithm 2. The time complexity of Algorithm 2 is O∣V∣ ⁠, where ∣V∣ denotes the number of nodes in the R-tree. Algorithm 2 Index node selection Algorithm 2 Index node selection 5.2. Update to index instances A case in reality is that the query pattern to FR-Index could alter significantly under some scenarios, such as hot events in social applications. Two problems exist in this situation: the first is that most queries cannot match with any indexing instance, which could degrade query efficiency greatly; the second is that some indexing instances are scarcely queried but need to be updated periodically, which could waste storage and network resources. Therefore, it is necessary to dynamically update indexing instances for variations of workloads. In FR-index, Collector will periodically (assume the time window is T ⁠) collect query logs of all servers in a stratified random sampling manner. Based on the analysis of these logs, Collector will determine the queries mismatching all the indexing instances. If in one T ⁠, the percentage of queries matching indexing instances is less than Pthr (see Section 3.1), Collector will execute the following steps: Suppose the query criteria set of all unmatched queries is Qun ⁠. Using the indexing space construction strategy in Section 3.1, collector calculates a new data dimension collection Dnew ⁠. Dnew should cover Pthr×100% items in Qun ⁠. For each set Dci in Dnew ⁠, collector first calculates the percentage Pi of queries covered by Dci ⁠, then sorts Dci by corresponding Pi in descending order. For each set Dcj in current indexing space collection Dans ⁠, collector calculates the Pj for each Dcj ⁠, then sorts Dcj by corresponding Pj in descending order. Collector substitutes the latter items in Dans with the ahead several items in Dnew such that the summation of corresponding Pj in Dans is larger than Pthr ⁠. These new indexing spaces in Dans will be used for constructing new indexing instances, and the obsolete indexing instances will be deleted. To ensure the availability of FR-Index, collector ought to first construct new indexing instances, then substitute the obsolete indexing instances one by one during non-peak periods. 6. PERFORMANCE EVALUATION We evaluated the proposed indexing scheme on Amazon’s EC2 platform. Each instance has a 2.5 GHz DualCore Intel Xeon processor and 4 GB memory. We organized EC2 computing units into a simulative data center with Fat-tree topology. Table 3 lists some common experiment settings. TABLE 3. Common experiment settings. Configuration items Setting Size of data center 4-pod(16 servers) or 6-pod(54 servers) Number of stored data items 20k, 40k, 60k, 80k or 100k on each server Datasets YearPredictionMSD, Uniform_3d, Zipfian_3d Number of queries 1k, 2k, 3k, 4k or 5k Rthr larger than the radius of 70% of published index nodes Configuration items Setting Size of data center 4-pod(16 servers) or 6-pod(54 servers) Number of stored data items 20k, 40k, 60k, 80k or 100k on each server Datasets YearPredictionMSD, Uniform_3d, Zipfian_3d Number of queries 1k, 2k, 3k, 4k or 5k Rthr larger than the radius of 70% of published index nodes View Large TABLE 3. Common experiment settings. Configuration items Setting Size of data center 4-pod(16 servers) or 6-pod(54 servers) Number of stored data items 20k, 40k, 60k, 80k or 100k on each server Datasets YearPredictionMSD, Uniform_3d, Zipfian_3d Number of queries 1k, 2k, 3k, 4k or 5k Rthr larger than the radius of 70% of published index nodes Configuration items Setting Size of data center 4-pod(16 servers) or 6-pod(54 servers) Number of stored data items 20k, 40k, 60k, 80k or 100k on each server Datasets YearPredictionMSD, Uniform_3d, Zipfian_3d Number of queries 1k, 2k, 3k, 4k or 5k Rthr larger than the radius of 70% of published index nodes View Large Datasets. One real dataset and two synthetic datasets are used in our experiments. The real dataset is YearPredictionMSD [31] with size 199.5GB, which comprises 54 metadata and audio features from hundreds of thousands of songs. Table 4 shows the details of the second to sixth dimensions of YearPredictionMSD. Accordingly, we synthesized two datasets based on the 7digitalid, latitude and longitude dimensions of YearPredictionMSD: Uniform_3d, generated following uniform distribution; Zipfian_3d, generated strictly following zipfian distribution of skewness factor 0.8. For the two synthetic datasets, we generated 600 000 data points. TABLE 4. Details of YearPredictionMSD. Dimension Field name Type Description Range Standard deviation 2 Artist 7digitalid int ID from 7digital.com [4, 809205] 142 161.827 3 Artist familiarity float Algorithmic estimation [0.0, 1.0] 0.160 4 Artist hotness float Algorithmic estimation [0.0, 1.083] 0.144 5 Artist latitude float Latitude [-41.281, 69.651] 15.596 6 Artist longitude float Longitude [-162.437, 174.767] 50.501 Dimension Field name Type Description Range Standard deviation 2 Artist 7digitalid int ID from 7digital.com [4, 809205] 142 161.827 3 Artist familiarity float Algorithmic estimation [0.0, 1.0] 0.160 4 Artist hotness float Algorithmic estimation [0.0, 1.083] 0.144 5 Artist latitude float Latitude [-41.281, 69.651] 15.596 6 Artist longitude float Longitude [-162.437, 174.767] 50.501 View Large TABLE 4. Details of YearPredictionMSD. Dimension Field name Type Description Range Standard deviation 2 Artist 7digitalid int ID from 7digital.com [4, 809205] 142 161.827 3 Artist familiarity float Algorithmic estimation [0.0, 1.0] 0.160 4 Artist hotness float Algorithmic estimation [0.0, 1.083] 0.144 5 Artist latitude float Latitude [-41.281, 69.651] 15.596 6 Artist longitude float Longitude [-162.437, 174.767] 50.501 Dimension Field name Type Description Range Standard deviation 2 Artist 7digitalid int ID from 7digital.com [4, 809205] 142 161.827 3 Artist familiarity float Algorithmic estimation [0.0, 1.0] 0.160 4 Artist hotness float Algorithmic estimation [0.0, 1.083] 0.144 5 Artist latitude float Latitude [-41.281, 69.651] 15.596 6 Artist longitude float Longitude [-162.437, 174.767] 50.501 View Large Index construction. For each dataset, we randomly partitioned and distributed the entire dataset over the cluster, roughly making each server maintain the same number of data points. Then, we constructed a local R-tree in each node, and published the upper layer nodes of leaf nodes in each R-tree as the global index. Before query experiments, we randomly extracted 50 000 data items to generate a query sample covering all of the indexing dimensions of FR-Index, and played the query sample to fat-tree cluster, enabling the updating of global index. Query generation. For point query, we randomly selected a certain number of data items to generate the query sets. This method could roughly simulate the data distribution of target datasets, and the dense portion of the original datasets could be queried more frequently. For range query, the query sets are generated according to different selectivity, i.e. the different percentage of indexing space. Generally, one range query accounts for 0.1% of the searching space. The performance comparison between range queries with different selectivity will be conducted in Section 6.3. 6.1. Evaluation on index construction Initially, to construct an FR-Index instance, we selected the second, third and fourth dimensions of YearPredictionMSD, namely the 7digitalid, familiarity and hotness as the index space. Each server needs to construct a local index, i.e. a R-tree, for its data in the index space. Subsequently, the upper layer nodes of leaf nodes in R-tree were selected as the global index and were published in the cluster according to the mapping and publishing rules in Section 3. The global index nodes were cached in memory to accelerate query process. We respectively constructed FR-Index instances in a 4-pod Fat-tree with 16 servers and a 6-pod Fat-tree with 54 servers. The data items stored in each server were increased from 20,000 to 100 000 gradually. Figure 7 shows the size of local index and global index in 4-pod Fat-tree (a) and 6-pod Fat-tree (b). FIGURE 7. View largeDownload slide Size of global index and local index. (a) In 4-pod fat-tree and (b) in 6-pod fat-tree. FIGURE 7. View largeDownload slide Size of global index and local index. (a) In 4-pod fat-tree and (b) in 6-pod fat-tree. A key observation in Fig. 7 is that the size of global index is almost 10× smaller than that of local indexes under the same setting, which indicates a lightweight feature for FR-Index. Moreover, this advantage can be retained when the data center becomes larger, which verifies that FR-Index is scalable. In Fig. 7, we constructed a 3D indexing instance based on at most 100 000 data items (roughly 19GB) in each server, bring about the size of global index less than 40MB. In reality, however, the data volume stored in data centers could be much larger than the data volume in our experiments. FR-Index can reuse data servers as global index containers such that data center administrators need no more new machines to store global index, greatly reducing deployment cost. Note that the local index in data servers, though consuming much storage space, is necessary for data retrieval, while the global index introduced by FR-Index, occupying lesser storage, could be placed in memory for rapidly query processing. 6.2. Evaluation on load balancing In Section 3.4, we introduced PMF to balance the workloads in Fat-tree cluster, and gave the corresponding query algorithm in Section 4.3. To evaluate the validity, we first employed PMF to preprocess datasets YearPredictionMSD and Zipfian_3d, then randomly selected 20 000 ×16 data items to be distributed in 4-pod Fat-tree, with each server owning 20 000 items. We constructed an FR-Index instance with dimensions (2,3,4) and randomly generated 5,000 point queries with dimensions (2,3,4,5,6) to be played to FR-Index. For comparison, we also conducted query experiments on the original YearPredictionMSD and Zipfian_3d, along with Uniform_3d. The visiting times of each server are recorded and presented in Fig. 8. FIGURE 8. View largeDownload slide Visiting frequency for each server in 4-pod fat-tree. FIGURE 8. View largeDownload slide Visiting frequency for each server in 4-pod fat-tree. We observe that under Uniform_3d, the visiting frequencies of each server are roughly the same, while the visiting frequencies are greatly different for each server under Zipfian_3d and YearPredictionMSD. Another observation is that the effect of PMF is considerable, especially for YearPredictionMSD. Note that the PMF algorithm is typically employed to balance the query workload in global layer. In the following experiments, we employ PMF to preprocess all the datasets to obtain a nearly uniform distribution among servers. 6.3. Evaluation on query processing To evaluate the query performance of FR-Index, we first placed 20 000 data items in each server randomly. Then a 3D FR-Index instance is built to facilitate query processing. As mentioned in Section 6.1, the index space was composed of the second, third and fourth dimensions of YearPredictionMSD. 6.3.1. Comparison with scanning strategy After the index construction, we randomly generated a certain number of 5D point queries and range queries (searching 0.1% range of the entire indexing space), where the query dimensions consist of the second, third, fourth, fifth and sixth features from YearPredictionMSD. The number of queries was increased from 1000 to 5000 for each evaluation. Additionally, we set two query strategies: (1) FR-index strategy: process queries with the assistance of FR-Index instances. (2) Scanning strategy: broadcast queries to all servers and each server processes queries locally. Such strategy is similar to traditional data retrieval in distributed file systems. Note that for FR-Index, two rounds of queries were sent to the cluster in order. The first round was used to update indexing instances, and the second round was used to evaluate the query performance. Suppose that it costs T1 time with the first strategy and T2 time with the second strategy to process a same query set. We define (T2−T1)/T2×100% as time saving ratio (TSR). If the performance of FR-Index is better than the scanning strategy, TSR>0 ⁠; else, TSR<0 ⁠. Figure 9a and b shows the TSR in point/range query processing respectively. In the 4-pod data center, due to the extra cost for query forwarding and storage accessing, FR-Index behaved not very well. In the 6-pod fat-tree topology, however, our proposal can reduce nearly 15% time cost for point queries, and even nearly 20% time cost for range queries. Such observation implies that as the scale of data center increases, FR-Index can significantly accelerate the process of data retrieval compared to the traditional scanning strategy. FIGURE 9. View largeDownload slide Time Saving Ratio in query processing. (a) Point Queries and (b) range queries. FIGURE 9. View largeDownload slide Time Saving Ratio in query processing. (a) Point Queries and (b) range queries. 6.3.2. Evaluation on different range selectivity For range query, we defined range selectivity as the percentage of indexing space. To evaluate the impact of different selectivity on range query performance, we generated 1000 range queries with selectivity 0.1% and 1%, respectively, then sent these queries to 4-pod and 6-pod Fat-tree networks. Figure 10 shows the query performance in different fat-tree scales. FIGURE 10. View largeDownload slide Range queries with different selectivities. FIGURE 10. View largeDownload slide Range queries with different selectivities. For selectivity 0.1%, when Fat-tree scales from 16 to 54, we observed a 10% speedup; while for selectivity 1%, the speedup is 18%. According to the indexing space partitioning strategy in Section 3.2, under the same dataset, the indexing space designated to each server in 4-pod Fat-tree is larger than that in 6-pod Fat-tree. It indicates that for a range query, more servers would be involved for query processing in 6-pod Fat-tree than in 4-pod Fat-tree, thus the parallelism contributes to the whole query performance to some extent. When range selectivity increases, the initial server receiving queries could be the bottleneck, because it needs to forward queries to more candidate servers, and false positive cases could greatly increase as well, which causes performance degradation of the whole system. 6.4. Comparisons with RT-CAN and RB-index In this section, we present the performance comparisons between FR-Index and RT-CAN/RB-Index. RT-CAN employs R-tree based indexing structure and CAN based P2P routing protocol to support efficient multi-dimensional data retrieval. RB-Index is another two-layer indexing system designed for modular data centers based on Bcube overlay network. We respectively followed [4] and [9] to implement RT-CAN and RB-Index. The Rthr in RT-CAN and RB-Index was set to be slightly larger than the average radius of 70% published index nodes, similar to the Rthr in FR-Index. In addition, each server stores 20 000 data items. We built a 3d FR-Index instance on 4-pod Fat-tree with 16 servers and built an RB-Index on 4-port Bcube1 with 16 servers. We also organized similar number of servers to simulate the CAN topology. In addition, the searching space of CAN was divided and randomly designated among the cluster. To construct a 3D RB-Index system, we built one 2D main global index in Bcube1 and four 1D subsidiary global index in Bcube0 ⁠. 6.4.1. Time cost of index construction Figure 11a shows the index construction time of these three schemes. Results imply that the time consumption of FR-Index construction is 18–30% less than that of RT-CAN under the same data volume and server performance. The reason is that as a type of P2P network, the routing protocol in CAN is relatively inefficient, which could increase the time consumption of global index publishing. FIGURE 11. View largeDownload slide Performance comparison between FR-index and RT-CAN/RB-index. (a) Indexing construction, (b) point queries and (c) range queries. FIGURE 11. View largeDownload slide Performance comparison between FR-index and RT-CAN/RB-index. (a) Indexing construction, (b) point queries and (c) range queries. Besides, the construction time of RB-Index is more than the construction time of FR-Index and RT-CAN. For FR-Index and RT-CAN, local index construction and global index publishing are performed only in one 3D indexing space. For RB-Index, however, one 2D R-tree and four B-trees need to be constructed in each server to cover all the indexing dimensions, which could greatly increase the overhead of index construction and storage maintenance. 6.4.2. Time cost of query processing We randomly generated 2000 5D point queries and played them to these three indexing systems respectively. Figure 11b and c presents the performance comparisons of point query and range query. Results show that FR-Index saved roughly 20% of query time compared to RT-CAN. Additionally, the performance difference between RT-CAN and FR-Index becomes larger as the scale of data center increases. For range query, RB-Index preferentially employs its 2d main global index to process queries, then prune the query data to get the final results, which is slower than the 3d FR-Index, as shown in Fig. 11c. For point query, however, the query performance of RB-Index seems equal to the performance of FR-Index. In fact, RB-index employs bloom filters to reduce false positive cases, which could accelerate the query performance. 6.5. The scalability of indexing instances FR-Index is designed to accelerate multi-dimensional data retrieval. With one instance in charge of three dimensions, users can construct any number of indexing instances to satisfy specific indexing requirements. In this section, we compare the performance of index construction and query processing between FR-Index systems with different instances. For FR-Index 1 ⁠, we constructed an indexing instance with dimensions (2,3,4) in Table 4. For FR-Index 2 ⁠, we constructed two instances with dimensions (2,3,4) and (2,5,6) ⁠. Figure 12a presents the index construction time of FR-Index 1 and FR-Index 2 ⁠. Note that for each instance of FR-Index, each server should build a R-tree and publish selected nodes among the cluster as global index. But we observe that the construction time of FR-Index 2 is less than twice of the construction time of FR-Index 1 ⁠. The reason is that when constructing the first instance, we prefetch all the related data for subsequent instances construction. Thus, the disk I/O could be performed only once disregarding of the number of instances built. FIGURE 12. View largeDownload slide Comparisons between FR-Index with different instances. (a) Index construction and (b) query performance. FIGURE 12. View largeDownload slide Comparisons between FR-Index with different instances. (a) Index construction and (b) query performance. To compare the query performance between FR-Index 1 and FR-Index 2 ⁠, we generated two types of point query sets: one set Q5d consists of queries with dimensions (2,3,4,5,6) ⁠, and the other set Q4d consists of queries with dimensions (2,3,5,6) ⁠. Figure 12b shows the query performance of FR-Index 1 and FR-Index 2 under Q5d and Q4d ⁠. One observation is that under Q5d ⁠, the query performance of FR-Index 1 roughly equals the performance of FR-Index 2 ⁠; while under Q4d ⁠, FR-Index 2 greatly outperforms FR-Index 1 ⁠. According to the query strategy in Section 4, FR-Index generally assigns the instance whose dimensions best match query set to process the incoming queries. For Q5d with dimensions (2,3,4,5,6) ⁠, FR-Index 1 would assign instance (2,3,4) to process queries, while FR-Index 2 would assign instance (2,3,4) or instance (2,5,6) to finish the job. With the similar processing steps, the query performance of FR-Index 1 and FR-Index 2 are also very similar. For Q4d with dimensions (2,3,5,6) ⁠, however, only two indexing dimensions in instance (2,3,4) of FR-Index 1 are useful for query processing, which indicates more pruning cost and more false positive cases; while FR-Index 2 can simply handle the queries with instance (2,5,6) ⁠. To summarize, more indexing instances in FR-Index could involve more construction time and storage cost, while the performance of query processing could be also improved. 6.6. Evaluation on updates As discussed in Section 5, the updates in FR-Index are divided into two categories: update in index instances and update to index instances. Update to index instances is typically performed by the collector during non-peak periods, while the update in index instances, employing a cost model to dynamically adjust published global nodes, could contribute more to query performance in a short period of time. In this section, we evaluate the performance of global indexing update in a FR-Index instance constructed on 4-port Fat-tree. We generated two types of workloads with 5000 queries from YearPredictionMSD: one is produced by randomly extracting 5000 items from the target dataset, and the other is produced for querying items exclusively stored in four neighboring servers. In addition, we set the updating period of global index as 30 seconds. In this experiment, we first played the first type of queries for three times, then played the second type of queries, i.e. skewed workload, for ten times. The playback period of workloads is 10 seconds. Figure 13 shows the variation of query performance over time. FIGURE 13. View largeDownload slide Performance of updates. FIGURE 13. View largeDownload slide Performance of updates. We observe that at 30 seconds, the skewed workload targeted to four servers was played to FR-Index for the first time. Since the global index in these four servers was still adapted for the first type of workload, the query performance degraded greatly with more false positive cases. At 60 seconds, the four target servers started to adjust the published global nodes to reduce false positive queries based on the cost model discussed in Section 5.1.1. While not until experiencing another 30 seconds, the query latency caused by workload skewness was relieved to some extent. Note that even after the updating, the system performance is not as good as the performance in the first 30 seconds, the reason could be that the four target servers were still overloaded and thus becoming the system bottleneck. This problem can be solved by the underlying file system with techniques like cache mechanism. 6.7. The validity of dimension selection The indexing space selection strategy in Section 3.1 constricts the number of dimensions in a index instance to three. To evaluate the performance of indexing space selection in FR-Index, we built two different index instances: one is a 3D index instance, whose index dimensions are the 2th to 4th features in the dataset, called 3dFR ⁠; the other is a 5D index instance, whose dimensions are the 2th to 6th features in Table 4, called 5dFR ⁠. The indexing space partitioning and global nodes mapping of 5dFR were based on the first three dimensions. Construction comparisons between 3dFR and 5dFR are shown in Fig. 14. One observation is that the global index size and construction time of 3dFR are superior to that of 5dFR, whether in 4-pod Fat-tree or in 6-pod Fat-tree. It is not surprising since the global index of 5dFR involved more data dimensions, which could incur more disk access and consume more storage space. FIGURE 14. View largeDownload slide Index construction comparison between 3dFR and 5dFR. (a) Volume of global index and (b) time cost for construction. FIGURE 14. View largeDownload slide Index construction comparison between 3dFR and 5dFR. (a) Volume of global index and (b) time cost for construction. Based on the index instances of 3dFR and 5dFR, we generated different 5D query sets, produced from the (2−6)th features in the dataset, and sent these queries to 3dFR and 5dFR, respectively. The number of data items stored in each server was 20,000 ⁠. Figure 15a and b, respectively, shows the time cost of point queries and range queries. Notice that for 5dFR, the results returned by local index need no pruning, while the searching results in 3dFR need to be pruned. Nevertheless, Fig. 15 shows that the time cost of point queries and range queries of 3dFR was better than that of 5dFR under the same workloads. Besides, with the data volume increasing, the increment of query time in 3dFR is less than that of 5dFR. FIGURE 15. View largeDownload slide Query performance comparison between 3dFR and 5dFR. (a) Point queries and (b) range queries. FIGURE 15. View largeDownload slide Query performance comparison between 3dFR and 5dFR. (a) Point queries and (b) range queries. Since the query efficiency in global index of 5dFR is fairly similar to the searching efficiency of 3dFR, the reason why 3dFR outperformed 5dFR lies in the process of local index searching. As mentioned before, the global index of FR-Index dwells in memory, while the local index, much bigger than global index in size, is stored on disk. Searching in the local index files of 5dFR could incur much more disk access than 3dFR. Therefore, the query time of 3dFR is faster than the query time of 5dFR, which verifies the efficiency and validity of indexing space selection strategy in our proposal. 7. CONCLUSION This paper presents a distributed multi-dimensional indexing framework for switch-centric data centers with tree-like topology. We design FR-Index, a two-layer multi-dimensional indexing system, and corresponding query processing strategy to accelerate data retrieval in Fat-tree topology. A cost model based on Markov model and Fat-tree routing protocol is proposed for dynamic index selection and updating. We evaluated the performance of FR-Index on Amazon EC2 platform with real dataset and compared FR-Index with RT-CAN/RB-Index. Experiments validate that our proposal is scalable, efficient and lightweight, which can behave better on switch-centric data center. FUNDING This work was supported by the National Key R&D Program of China [2018YFB1004703]; the National Natural Science Foundation of China [61872238, 61672353]; the Shanghai Science and Technology Fund [17510740200]; the Huawei Innovation Research Program [HO2018085286]; and the State Key Laboratory of Air Traffic Management System and Technology [SKLATM20180X]. ACKNOWLEDGMENTS The authors would like to thank Yatao Zhang for his contribution on the early versions of this paper. REFERENCES 1 Ghemawat , S. , Gobioff , H. and Leung , S. ( 2003 ) The Google File System. Proc. 19th ACM Symp. Operating Systems Principles 2003, Bolton Landing, NY, USA, 19–22 October, pp. 29–43. ACM, New York, USA. 2 Lakshman , A. and Malik , P. ( 2010 ) Cassandra: a decentralized structured storage system . Oper. Syst. Rev. , 44 , 35 – 40 . Google Scholar Crossref Search ADS 3 DeCandia , G. , Hastorun , D. , Jampani , M. , Kakulapati , G. , Lakshman , A. , Pilchin , A. , Sivasubramanian , S. , Vosshall , P. and Vogels , W. ( 2007 ) Dynamo: Amazon’s Highly Available Key-Value Store. Proc. 21st ACM Symp. Operating Systems Principles 2007, Stevenson, Washington, USA, 14–17 October, pp. 205–220. ACM, New York, USA. 4 Wang , J. , Wu , S. , Gao , H. , Li , J. and Ooi , B.C. ( 2010 ) Indexing Multi-Dimensional Data in a Cloud System. Proc. ACM SIGMOD Int. Conf. Management of Data, Indianapolis, Indiana, USA, 6–10 June, pp. 591–602. ACM, New York, USA. 5 Wu , S. , Jiang , D. , Ooi , B.C. and Wu , K. ( 2010 ) Efficient B-tree based indexing for cloud data processing . PVLDB , 3 , 1207 – 1218 . 6 Wu , S. and Wu , K. ( 2009 ) An indexing framework for efficient retrieval on the cloud . IEEE Data Eng. Bull. , 32 , 75 – 82 . 7 Hong , Y. , Tang , Q. , Gao , X. , Yao , B. , Chen , G. and Tang , S. ( 2016 ) Efficient r-tree based indexing scheme for server-centric cloud storage system . IEEE Trans. Knowl. Data Eng. , 28 , 1503 – 1517 . Google Scholar Crossref Search ADS 8 Al-Fares , M. , Loukissas , A. and Vahdat , A. ( 2008 ) A Scalable, Commodity Data Center Network Architecture. Proc. ACM SIGCOMM Conf. Applications, Technologies, Architectures, and Protocols for Computer Communications, Seattle, WA, USA, 17–22 August, pp. 63–74. ACM, New York, USA. 9 Gao , L. , Zhang , Y. , Gao , X. and Chen , G. ( 2015 ) Indexing Multi-Dimensional Data in Modular Data Centers. Proc. 26th Int. Conf. Database and Expert Systems Applications, Valencia, Spain, 1–4 September, pp. 304–319. Springer, Berlin, Germany. 10 Gao , X. , Li , B. , Chen , Z. , Yin , M. , Chen , G. and Jin , Y. ( 2015 ) FT-INDEX: A Distributed Indexing Scheme for Switch-Centric Cloud Storage System. 2015 IEEE Int. Conf. Communications, London, UK, 8–12 June, pp. 301–306. IEEE, New York, USA. 11 Liu , Y. , Gao , X. and Chen , G. ( 2015 ) A Universal Distributed Indexing Scheme for Data Centers with Tree-Like Topologies. Proc. 26th Int. Conf. Database and Expert Systems Applications, Valencia, Spain, 1–4 September, pp. 481–496. Springer, Berlin, Germany. 12 Chen , T. , Gao , X. and Chen , G. ( 2016 ) The features, hardware, and architectures of data center networks: a survey . J. Parallel Distrib. Comput. , 96 , 45 – 74 . Google Scholar Crossref Search ADS 13 Jagadish , H.V. , Ooi , B.C. and Vu , Q.H. ( 2005 ) BATON: A Balanced Tree Structure for Peer-to-Peer Networks. Proc. 31st Int. Conf. Very Large Data Bases, Trondheim, Norway, 30 August–2 September, pp. 661–672. ACM, New York, USA. 14 Wang , J. , Wu , S. , Gao , H. , Li , J. and Ooi , B.C. ( 2010 ) Indexing Multi-Dimensional Data in a Cloud System. Proc. ACM SIGMOD Int. Conf. Management of Data, Indianapolis, Indiana, USA, 6–10 June, pp. 591–602. ACM, New York, USA. 15 Ratnasamy , S. , Francis , P. , Handley , M. , Karp , R.M. and Shenker , S. ( 2001 ) A Scalable Content-Addressable Network. Proc. 2001 Conf. Applications, Technologies, Architectures, and Protocols for Computer Communications, San Diego, CA, USA, October, pp. 161–172. ACM, New York, USA. 16 Ding , L. , Qiao , B. , Wang , G. and Chen , C. ( 2011 ) An Efficient Quad-Tree Based Index Structure for Cloud Data Management. Proc. 12th Int. Conf. Web-Age Information Management, Wuhan, China, 14–16 September, pp. 238–250. Springer, Berlin, Germany. 17 Stoica , I. , Morris , R.T. , Liben-Nowell , D. , Karger , D.R. , Kaashoek , M.F. , Dabek , F. and Balakrishnan , H. ( 2003 ) Chord: a scalable peer-to-peer lookup protocol for internet applications . IEEE/ACM Trans. Netw. , 11 , 17 – 32 . Google Scholar Crossref Search ADS 18 Bin , H. and Yu-Xing , P. ( 2011 ) An Efficient Two-Level Bitmap Index for Cloud Data Management. Proc. Third IEEE Int. Conf. Communication Software and Networks, Xi’an, China, 27–29 May, pp. 509–513. IEEE, New York, USA. 19 Guo , C. , Lu , G. , Li , D. , Wu , H. , Zhang , X. , Shi , Y. , Tian , C. , Zhang , Y. and Lu , S. ( 2009 ) Bcube: A High Performance, Server-Centric Network Architecture for Modular Data Centers. Proc. ACM SIGCOMM 2009 Conf. Applications, Technologies, Architectures, and Protocols for Computer Communications, Barcelona, Spain, 16–21 August, pp. 63–74. ACM, New York, USA. 20 Guo , D. , Chen , T. , Li , D. , Li , M. , Liu , Y. and Chen , G. ( 2013 ) Expandable and cost-effective network structures for data centers using dual-port servers . IEEE Trans. Comput. , 62 , 1303 – 1317 . Google Scholar Crossref Search ADS 21 Sioutas , S. , Sourla , E. , Tsichlas , K. and Zaroliagis , C.D. ( 2015 ) ART + + : A Fault-Tolerant Decentralized Tree Structure with Ultimate Sub-logarithmic Efficiency. Proc. First Int. Workshop on Algorithmic Aspects of Cloud Computing, Patras, Greece, 14–15 September, pp. 126–137. 22 Sioutas , S. , Triantafillou , P. , Papaloukopoulos , G. , Sakkopoulos , E. , Tsichlas , K. and Manolopoulos , Y. ( 2013 ) ART: sub-logarithmic decentralized range query processing with probabilistic guarantees . Distrib. Parallel Databases , 31 , 71 – 109 . Google Scholar Crossref Search ADS 23 Sioutas , S. , Sourla , E. , Tsichlas , K. and Zaroliagis , C.D. ( 2015 ) D3-tree: A Dynamic Deterministic Decentralized Structure. Proc. 23rd Annual European Symposium on Algorithms, Patras, Greece, 14–16 September, pp. 989–1000. Springer, Berlin, Germany. 24 Brodal , G.S. , Sioutas , S. , Tsichlas , K. and Zaroliagis , C.D. ( 2015 ) D2-tree: a new overlay with deterministic bounds . Algorithmica , 72 , 860 – 883 . Google Scholar Crossref Search ADS 25 Kokotinis , I. , Kendea , M. , Nodarakis , N. , Rapti , A. , Sioutas , S. , Tsakalidis , A.K. , Tsolis , D. and Panagis , Y. ( 2016 ) NSM-Tree: Efficient Indexing on Top of NoSQL Databases. Proc. Second Int. Workshop on Algorithmic Aspects of Cloud Computing, Aarhus, Denmark, 22 August, pp. 3–14. Springer, Berlin, Germany. 26 Ciaccia , P. , Patella , M. and Zezula , P. ( 1997 ) M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces. Proc. 23rd Int. Conf. Very Large Data Bases, Athens, Greece, 25–29 August, pp. 426–435. Morgan Kaufmann, MA, USA. 27 Schmidt , C. and Parashar , M. ( 2008 ) Squid: enabling search in DHT-based systems . J. Parallel Distrib. Comput. , 68 , 962 – 975 . Google Scholar Crossref Search ADS 28 Lee , J. , Lee , H. , Kang , S. , Kim , S.M. and Song , J. ( 2007 ) CISS: an efficient object clustering framework for dht-based peer-to-peer applications . Comput. Netw. , 51 , 1072 – 1094 . Google Scholar Crossref Search ADS 29 Ganesan , P. , Yang , B. and Garcia-Molina , H. ( 2004 ) One Torus to Rule Them All: Multi-dimensional Queries in p2p Systems. Proc. 7th Int. Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS, Paris, France, 17–18 June, pp. 19–24. ACM, New York, USA. 30 Zhang , R. , Qi , J. , Stradling , M. and Huang , J. ( 2014 ) Towards a painless index for spatial objects . ACM Trans. Database Syst. , 39 , 19:1 – 19:42 . Google Scholar Crossref Search ADS 31 Bertin-Mahieux , T. , Ellis , D.P.W. , Whitman , B. and Lamere , P. ( 2011 ) The Million Song Dataset. Proc. 12th Int. Society for Music Information Retrieval Conference, Miami, FL, USA, 24–28 October, pp. 591–596. University of Miami, Florida, USA. © The British Computer Society 2018. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Accelerate Data Retrieval by Multi-Dimensional Indexing in Switch-Centric Data Centers JF - The Computer Journal DO - 10.1093/comjnl/bxy132 DA - 2019-02-01 UR - https://www.deepdyve.com/lp/oxford-university-press/accelerate-data-retrieval-by-multi-dimensional-indexing-in-switch-g9qKiJzI4t SP - 301 VL - 62 IS - 2 DP - DeepDyve ER -