Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
Shortlisting the Influential Members of Criminal Organizations and Identifying their Important Communication Channels Kamal Taha, Senior Member, IEEE and Paul D. Yoo, Senior Member, IEEE to depict the relationships among its members [8]. Most current Abstract— Low-level criminals, who do the legwork in a criminal approaches compute the relative importance of each node in a organization are the most likely to be arrested, whereas the high-level ones tend to avoid attention. But crippling the work of a criminal criminal network to infer the primary nodes representing the organizations is not possible unless investigators can identify the most leaders, a strategy known as the relative importance problem [19]. influential, high-level members and monitor their communication These approaches are usually inspired by the k-clique technique channels. Investigators often approach this task by requesting the [18, 20, 29, 48], network metrics [26], and semantic similarities [3, mobile phone service records of the arrested low-level criminals to 25, 44]. However, such strategies generally cannot short-list the identify contacts, and then they build a network model of the important communication channels in a network and the channels organization where each node denotes a criminal and the edges need to be investigated directly to provide insight into the criminal represent communications. Network analysis can be used to infer the organization and its influential members. There are numerous most influential criminals and most important communication channels within the network but screening all the nodes and links in a communication channels in criminal networks, and investigating network is laborious and time consuming. Here we propose a new all the channels is time-consuming and distracting, when the forensic analysis system called IICCC (Identifying Influential investigators would prefer to focus on the key channels that reveal Criminals and their Communication Channels) that can effectively the most information. and efficiently infer the high-level criminals and short-list the In order to address this challenge, we have developed the important communication channels in a criminal organization, based IICCC system (Identifying Influential Criminals and their on the mobile phone communications of its members. IICCC can also Communication Channels) which exploits the hierarchical be used to build a network from crime incident reports. We evaluated structure of typical criminal organizations to infer not only the IICCC experimentally and compared it with five other systems, high-ranking members but also the vital communication channels, confirming its superior prediction performance. based on the propagation of information among nodes. The key Index Terms— Forensic investigation tool, criminal network, mobile contribution of this paper lies in identifying the vital communications data, influential criminals, low-level criminals. communication channels in a criminal network. These vital channels involve a sequence of nodes that form part of one or more I. INTRODUCTION communication paths and depict influential criminals, who Sanctioning and removing the leaders of a criminal organization propagate important information diffused by a top-ranking can lead to a significant reduction in the organization’s criminal member. These communication channels are likely to hold vital activities [11, 32]. These criminals play crucial and decisive roles information about the criminal organization [49]. The ultimate goal in controlling the flow of information within the organization. of our proposed system IICCC is to identify these important Digital forensics tools are widely used by criminal investigators to communication channels. Since the information propagated identify the leaders of criminal organizations. Some of these tools through these channels originate from influential members in an are inspired by social network analysis (SNA) [39] and others organization, we also describe a methodology in the paper for employ the k-clique technique [21, 41, 45]. For example, the k- identifying these influential members. clique technique was used to identify a group of hackers called The authors of [23] observed that the change in the flow rate Shadowcrew [29], and to identify a community of criminals from of information diffused by higher-level criminals is slower than a large dataset of Canadian offenders [16]. that of lower-level criminals. This has led us to hypothesize that Hierarchically structured criminal organizations mainly use the steadiness of information flow rate increases as the chain of mobile phones and emails to communicate, so access to mobile command gets higher. That is, we hypothesize that influential communications data (MCD) and email records allows the leaders diffuse information in more steady rate than less influential relationships among the members of such organizations to be ones. IICCC identifies the top leaders of a criminal organization depicted as networks, often allowing the identification of leaders based on the above-mentioned hypothesis by computing the chi- and the most important communication channels [16, 18, 22, 28, 2 squared (χ ) values [17, 34] of the network’s nodes. Criminals 29, 30, 41, 45]. For example, the email addresses of Nigeria-based represented by nodes with high χ are considered top leaders. scammers were linked to their Facebook profiles and the k-clique Finally, IICCC identifies the important communication channels was used to identify the scammers and their leaders, based on a originating from these top leaders. These channels are likely to hold social network with 40,000 nodes [18]. More recently, crime vital information about the criminal organization. An important incident reports that involve criminal organizations have been used K. Taha is with the Electrical and Computer Engineering Department, Khalifa University, UAE (e-mail: [email protected]). P. Yoo is with the CSIS within Birkbeck College at the University of London, United Kingdom (email: [email protected] ) communication channel is a sequence of influential criminals who A “critical communication channel” is likely to hold vital propagate information diffused by a top leader. In the framework information about a criminal organization. The ultimate goal of of IICCC, these important channels are identified by computing the IICCC is to identify the “critical communication channels” in a χ values of the communication channels connecting each two network. To identify the “critical communication channels”, IICCC adjacent nodes that are part of a communication path originating from a top leader. The channels with large χ values are considered identifies the influential nodes representing the top-ranked critical communication channels. members as a precursor. To identify the top-ranked members, IICCC identifies the “significant communication paths” as a II. MOTIVATION AND OUTLINE OF THE APPROACH precursor. Low-level criminals who do the leg work are usually the easier to IICCC infers the high-level criminals and the critical be caught and taken into custody. Criminal investigators, usually, communication channels in a criminal organization by going take advantage of these detained criminals as a bridge for through the following sequence of steps: identifying their top leaders [1, 55]. To do so, the investigators 1) Construct a network based on either MCD or crime would ask mobile service providers for the MCD that belongs to incident reports pertaining to the organization. Section III these arrested low-level criminals, their callers, the callers of their describes this process in detail. callers, and so on. Then, they would build a communication 2) Identify the significant communication paths originating network depicting the criminal organization. This is an effective from each node in the network. Subsection IV describes approach, because the vast majority of criminal organizations, this process in detail. nowadays, contemplate their plots using mobile phones [21]. Many 3) Compute the χ value of each node in a significant techniques have been proposed in literature for identifying the communication path, based on the actual (observed) and relative importance of nodes in such networks [18, 25, 29, 48]. expected betweenness centralities of the nodes. However, most of these techniques may not work well for short- Subsection V-A describes this process in detail. listing the critical communication channels, which provide insight 4) Identify the influential criminals by ranking the nodes into criminal organizations and their influential members. This is based on their χ values. Subsection V-B describes this because these techniques have been designed to identify the influential nodes in a network and not the critical communication process in detail. channels that pass high and steady rate of information diffused by 5) Identify the critical communication channels in the these influential nodes. We introduce in this paper the forensic network by: (1) computing the actual (observed) system IICCC, which can infer the high-level criminals and short- betweenness centralities, expected betweenness list the important communication channels in a criminal centralities, and χ values of the communication channels organization, based on the mobile phone communications of its ¢ (edges) connecting each two adjacent nodes m and m members. that are part of a significant communication path originating from an influential node in the network; Identifying high-level criminals and short-listing the (2) computing the summation of the χ values of the important communication channels originating from them can help communication channels, which are part of the different investigators monitor the criminal organization. This monitoring significant communication paths that pass through and can give an insight into the organization’s work and the identity of m a large number of the insiders and outsiders who deal with it. This m¢ ; and (3) identifying the channels with large χ values, will eventually lead to the arrest of these people, which will most which will be considered the critical communication likely result in at least crippling the organization’s work. Thus, for channels. Subsection VI describes this process in detail. the monitoring procedure to be effective, the influential members Fig. 1 shows a process legend that visualizes the sequential of a criminal organization should be identified. Sanctioning and processing steps taken by IICCC to identify influential leaders and removing (e.g., arresting) these influential leaders can lead to “critical communication channels”. significant reduction in the organization’s criminal activities [11, 32]. We use the term “significant communication path” to denote a sequence of nodes in a criminal network representing influential criminals who propagate information diffused by a top-ranked leader in the criminal organization. We use the term “critical communication channel” to denote a portion of a “significant communication path” that has a high and steady rate of information flow relative to the other paths in the network. Let p be a “significant communication path” originating from a top-ranked leader r in a criminal organization. A “critical communication channel” denotes a sub-path p¢ p that has a high and steady flow rate of information diffused by r and propagated by influential criminals to lower-level criminals. Some of this information may be passed to ¢ via other significant Fig. 1: A process legend that visualizes the sequential processing steps taken by IICCC to identify influential leaders and “critical communication channels”. communication paths originating from r. III. COMPUTING NODE BETWEENNESS CENTRALITY To illustrate the concepts proposed in this paper, we use a running IICCC can construct networks from either MCD or crime incident example based on the network shown in Fig. 2. The network reports. In a MCD network, a node denotes a caller/receiver depicts communication attempts within a criminal organization criminal and an edge denotes the flow of information (phone calls, based on its MCD, where a node denotes a criminal and an edge text messages or emails) between two criminals. IICCC adopts the denotes the flow of information (phone calls and text messages) space approach concept [7] to automatically build a network from between two criminals. The network consists of 45 nodes and 103 crime incident reports [8]. We assume that criminals who appear in edges (communication channels). the same crime incident reports collaborate in committing crimes. Let e be an edge linking two nodes n and n in a network built from 1 2 Example 1: Table 1 shows the betweenness centralities of the crime incident reports. Then e denotes the co-occurrence of two 45 nodes in the running network presented in Fig. 2. criminals represented by n and n in the same crime incident 1 2 reports. Node betweenness has been used extensively to indicate the centralities of nodes in a network [45]. A node with a relatively large betweenness centrality acts as a bridge that controls the flow of information between the nodes of the network. Thus, the betweenness centralities of nodes are reflective of the relative influences of these nodes in the network. IICCC therefore computes the betweenness centralities of all nodes in a network to capture their relative influence over the information flow in the network. Several variations that capture the notion of node betweenness have been proposed. In one widely adopted measure [14], the betweenness B(v) of a node v is the fraction of the number of shortest paths from all nodes to all other nodes that pass-through v, as defined in Equation 1: Fig. 2: A network depicting the communication attempts within a criminal organization based on its MCD. The network contains 45 nodes representing 45 criminals in the organization. The network contains 103 edges representing 103 s v ( ) communication channels among the 45 criminals. st (1) Bv = ( ) sv¹¹tÎV st Table 1: The betweenness centrality of each of the 45 nodes in our running network presented in Fig. 2. where is the number of shortest paths from node sV Î to s v ( ) st node tV Î that passes through v , and is the overall number of st shortest paths from node s to node v . Calculating the betweenness centralities of nodes in a network that has n nodes and k links (edges) using a breadth-first search (BFS) algorithm takes O(kn ) [42]. This is because, computing the shortest path between two nodes takes O(k) and there are O(n ) pairs of nodes. A faster algorithm that calculates the betweenness centrality is based on the dependency accumulation technique [47]. Nodes are accessed in the reverse order compared IV. IDENTIFYING THE SIGNIFICANT COMMUNICATION PATHS to BFS. The algorithm takes O(kn) on an unweighted network, ORIGINATING FROM EACH NODE IN A NETWORK where k is the number of links. In the framework of IICCC, we use IICCC identifies the significant communication paths originating this technique to compute node betweenness, as defined in from a node n as follows. Let S be the set of nodes that are directly Equation 2: connected to n by edges. For each node m in S, IICCC determines the significant communication path p originating from n and passing through m as follows. Let m be a node at hierarchical level Bv() = d ()v (2) å l in p. Let S be the set of nodes at hierarchical level (l +1) that are s• sv ¹ ¢ ¢ directly connected tom by edges. From among the nodes in S , where B(v) is the betweenness of a node v, and d () v is the ¢¢ IICCC selects the node m that has the highest betweenness s• ¢¢ dependency of a node s on node v, as defined in Equation 3. centrality. Thus, m becomes part of the path p. This process continues until the path p starts to converge to itself. That is, path sv p starts from n and ends at n¢ , which is the node in the path where dd ()vw =+ (1 ( )) (3) ss•• å the path starts to converge to itself. wv:( ÎP w) sw Example 2: Let us identify the significant communication where Ps(w) is the set of predecessors of a node w located on the paths originating from node 6 in our running network shown in Fig. shortest paths from node s, and s is the overall number of 2. The set of nodes that is directly connected to node 6 by edges is sv {5, 9, 13, 10, 3} indicating five significant communication paths. shortest paths from node s to node v . For easy reference, we refer to these paths using the following of information flow rate increases as the chain of command gets notations: 6à5, 6à9, 6à13, 6à10 and 6à3. Fig. 3 shows these higher. That is, we hypothesize that influential leaders diffuse five channels. For example, path 6à13 is identified as follows: information in more steady rate than less influential ones. • The set of nodes directly connected to node 13 by edges is IICCC identifies the top leaders of a criminal organization based on the above-mentioned hypothesis. More specifically, if the {9, 5, 12, 17, 18, 19}. Node 18 has the largest betweenness centrality among the nodes in this set (Table 1). difference between the actual and expected betweenness centralities of the nodes located in the significant communication • At level 2 of channel 6à13, the information therefore paths originating from a node n is small, IICCC considers n as passes through node 18. The set of nodes directly connected influential. To achieve this, IICCC uses χ analysis [17, 34] to to node 18 by edges is {21, 28, 22, 23, 14}. Node 21 has the compute the goodness of fit of the observed and expected largest betweenness centrality among the nodes in this set betweenness centralities of the nodes in a significant (Table 1). communication path. This is because χ analysis can effectively • At level 3 of channel 6à13, the information therefore determine whether an observed distribution conforms to any passes through node 21. particular expected distribution. IICCC considers criminals • Ultimately, the sequence (6à13à18à21à27à34à28à27) represented by nodes with large χ values as influential. starts to converge to itself at node 27. This happens because (1) the information passes through node 28 at level 6; (2) A. Computing the χ Value of a Node node 27 has the largest betweenness centrality among the In IICCC, the degree of influence of a node n is represented by its nodes that are directly connected to node 28; and (3) the χ value [17, 34]. The degree of influence of n is the summed information has already passed through node 27 at level 4. degrees of influence of the nodes in the significant communication 2 2 paths originating from n. Thus, the χ value of n is the overall χ value of the nodes in the significant communication paths originating from n. That is, the χ value of n is the summation of 2 2 the χ values of the paths originating from n. The χ value of each of these paths is the summation of the χ values of the nodes in the path. The χ value of each of these nodes is determined as follows. Let p be a significant communication path originating from n. Let m be a node located at hierarchical level l of p. The χ value of m is determined based on the betweenness centralities of (1) the nodes in p; and (2) the nodes at level l of the other paths originating from n, as defined in Equation 4 [17, 34]: 2 2 χ = (O−E) /E (4) where O is the observed betweenness centrality of the node, and E is the expected betweenness centrality of the node. The Expected betweenness centrality ( ) of a node n located at level l of a path p is computed using Equation 5: BB ´ pl Fig. 3: The five significant communication paths originating from node 6 in our nn (5) E = running example network. Each path is assigned a different color. n all where B is the betweenness centrality of path p originating from V. IDENTIFYING THE INFLUENTIAL CRIMINALS IN THE NETWORK node n; B is the summation of the betweenness centralities of the Arresting the influential leaders of a criminal organization can lead l to the reduction of the rate of information flow, which in turn nodes at level l of the paths originating from n; and is the all destabilizes the criminal network [36]. The information diffused by summation of the betweenness centralities of the paths originating these leaders is propagated by criminals in the chain of command from n. [37]. As [35] indicates, the directives from the leaders of a criminal Let p and p be two significant communication paths organization are transmitted from higher-level criminals to lower- u v level criminals in the chain of hierarchy. The authors of [23] originating from a node n under consideration. Let pu contain u levels and p contain v levels. Let p be the longest significant observed that the change in the rate of information flow decreases v u communication path originating from n. Thus, v < u. IICCC would as the number of the receivers of this information increases. As a assign a zero observed betweenness centrality (O) for each result, these authors assume that the change in the flow rate of information diffused by higher-level criminals is slower than that hypothetical node located at levels v +1, v +2, …., u. of lower-level criminals. This finding conforms to several studies, which observed that as the influence of a criminal gets higher, the Example 3: Let us compute the χ value for node 6 in our centrality score of the node representing this criminal increases running network as shown in Fig. 3. The left-hand portion of Table [36] (i.e., a higher score of a node could be an indicative of a larger 2 shows the observed betweenness centralities (O) of the nodes number of receivers of the information diffused by this node). All comprising the five significant communication paths originating the above revelations have led us to hypothesize that the steadiness from node 6 (refer to Example 2). As Fig. 3 shows, path 6à3 is the longest path and contains nine levels. Given that each of the paths Example 4: Table 3 ranks the nodes in our running network 6à5, 6à9 and 6à10 contains only eight levels, the hypothetical based on their χ values, which are calculated using the techniques nodes at level 9 of these three paths are assigned zero observed described in Subsection V-A and Example 3. Investigators should betweenness centralities (Table 2). Because path 6à13 contains focus on the top-ranked nodes, especially node 6. only seven levels, the hypothetical nodes at levels 8 and 9 of this path are assigned zero observed betweenness centralities (Table 2). Table 3: The nodes in our running network ranked based on their χ values. The middle portion of Table 2 shows the expected betweenness Rank Node Ch. Sq. Rank Node Ch. Sq. Rank Node Ch. Sq. centralities (E) of the nodes comprising the five paths. For example, the expected betweenness centrality of node 5 at level 1 1 6 3.323 16 32 2.037 31 11 1.236 1.303´0.426 of path 6à5 is calculated as follows: . The 2 34 2.916 17 35 1.909 32 17 1.216 E== 0.092 6.022 3 29 2.908 18 37 1.897 33 44 1.145 right-hand portion of Table 2 shows the χ values of the nodes 4 3 2.894 19 14 1.831 34 33 1.123 comprising the five paths, the summed χ values of the five paths 5 38 2.822 20 20 1.803 35 8 1.1 (bottom row), and the χ value of node 6 (the bottom right-hand 6 21 2.815 21 15 1.772 36 41 0.926 corner). For example, the χ value of node 5 at level 1 of path 6à5 2 7 12 2.753 22 4 1.699 37 9 0.921 (0.085 - 0.092) is calculated as follows: . The χ value of x== 0.001 8 43 2.677 23 19 1.688 38 28 0.858 0.092 9 13 2.562 24 16 1.678 39 23 0.852 a path is the summation of the χ values of the nodes comprising it. 2 2 10 18 2.523 25 30 1.527 40 40 0.848 For example, the χ value of path 6à5 is 0.227 (Table 2). The χ 11 22 2.401 26 25 1.431 41 7 0.592 value of a node from which significant communication paths 12 39 2.274 27 31 1.344 42 2 0.568 originate is the summation of the χ values of these paths. For 2 2 example, the χ value of node 6 is the summation of the χ values 13 36 2.261 28 26 1.289 43 1 0.38 of the five paths (3.323; bottom right-hand corner of Table 2). 14 27 2.136 29 5 1.25 44 42 0.363 15 10 2.113 30 11 1.236 45 45 0.254 VI. IDENTIFYING THE CRITICAL COMMUNICATION CHANNELS Table 2: Observed betweenness centralities, expected betweenness centralities, and χ values of the nodes comprising the significant communication paths originating Each member of a criminal organization O is controlled either from node 6 as well as the overall χ value of node 6 in our running network as directly or indirectly by a core influential criminal(s) i. The described in Example 3. information diffused by i is propagated to the members of O through communication channels. The order in which the criminals receive and propagate the information within these channels corresponds to their hierarchical influence in O [27]. Therefore, the portions of these channels that involve influential members of O are likely to hold vital information about O, hence their designation as critical communication channels. In IICCC, a critical communication channel is a sequence of influential criminals who form part of one or more significant communication paths that pass information diffused by a top-ranking member. In other words, let p be a significant communication path originating from a top influential criminal i. A critical communication channel is a sub- path (channel) p¢ p that encompasses a sequence of influential B. Identifying the Influential Criminals criminals, who propagate information through either (1) p alone, or Nowadays, criminal organizations can be classified as either (2) p and other significant communication paths originating from i. hierarchically well-structured [27] or hierarchically loosely Based on the betweenness centralities of the edges in a network structured [52]. Usually, a hierarchically well-structured [15], IICCC calculates the expected betweenness centralities and organization operates under the directions of leaders of varying χ values of the communication channels connecting each two degrees of influences [27]. Hierarchically loosely structured adjacent nodes m and m¢ that are part of a significant organizations are usually composed of loosely connected groups of communication path originating from the node under criminals [52]. Each group includes some influential criminals, consideration. Then IICCC computes the summation of the χ who provide some type of directions to the group. The framework values of the communication channels that are part of the different of IICCC can be applied to both, the hierarchically well-structured significant communication paths that pass through m and m . For and loosely structured organizations, as long as their communication networks can be depicted. IICCC can help example, if there are k significant communication paths that pass investigators to identify the influential criminals in these through m and m¢ , we sum the χ values of the channels of these organizations by generating short-lists comprising small and paths between m and m¢ . That is, we sum the χ values of the tightly-defined groups of their most influential individuals. It does portion of these paths (channels) at their hierarchical levels so by ranking the nodes in a criminal network based on their χ 2 between m and m . This summation is considered the χ value of values (Subsection V-A). Criminals represented by the top-ranked the communication channel connecting m and m . The channels nodes are considered by IICCC as the most influential ones in the with large χ values are considered critical communication organization. channels. Example 5: Let us identify the critical communication channels originating from the most core influential node in our running network, which is node 6 (Table 3). Based on the betweenness centralities of the edges in the network [15] (the observed betweenness centralities of the edges), we calculate the expected betweenness centralities and χ values of the communication channels connecting each two adjacent nodes that are part of the significant communication paths originating from node 6. As shown in Table 4, the χ value of each communication channel connecting each two adjacent nodes and m¢ that are part of a significant communication path p is computed based on the hierarchical level of m and m within p. For example, the communication channel at level 3 of path 6à5 (i.e., the portion of the path between nodes 13 and 18) has an observed betweenness centrality of 364.1, an expected betweenness centrality of 250.1, 2 2 and a χ value of 52. The overall χ value of a communication channel connecting two adjacent nodes m and m is the summation of the χ values of the communication channels connecting and m¢ that are part of the significant communication paths passing through and m¢ . Fig. 4 shows the overall χ value of each communication channel. For example, Table 5 shows the overall χ value of the communication channel Fig. 4: The overall χ value of each communication channel connecting each two connecting nodes 13 and 18, which is computed as follows. As Fig. adjacent nodes that are part of a significant communication path originating from 3 shows, paths 6à5, 6à9, 6à13, and 6à3 pass through nodes 13 node 6 (the most influential node) in our running network. The critical communication channels originating from node 6 are marked with thick solid and 18 at different hierarchical levels: paths 6à5 and 6à9 pass arrows and the rest are marked with thick dotted arrows. The χ values of each through the two nodes at the third hierarchical level, path 6à13 communication channel originating from node 6 are also shown. passes through the two nodes at the second hierarchical level, and path 6à3 passes through the two nodes at the fourth hierarchical level. Therefore, the χ values of the different communication 2 Table 5: The overall χ value of the communication channel connecting nodes 13 channels connecting nodes 13 and 18 are computed based on the and 18 computed by summing the χ values of the different channels connecting these two nodes based on the hierarchical level of the two nodes. above hierarchical levels. Table 5 shows the corresponding 2 2 χ value Path hierarchical levels and χ values. Accordingly, the overall χ value Hierarchical level of path between nodes 13 and 18 of the channel connecting nodes 13 and 18 is 525.2, which was Path 6 à5 52 2 3 computed by summing the χ values of the different channels. The Path 6 à 9 86.6 critical communication channels are therefore those connecting the following nodes: (6, 13), (13, 18), and (18, 21). Although Fig. 4 Path 6à13 302.7 shows there are 103 edges (communication channels) in the Path 6à 3 83.9 network, IICCC was able to short-list three critical communication channels allowing investigators to focus on these as key targets. ∑ (overall χ value of channel 13-18) 525.2 VII. EXPERIMENTAL RESULTS Table 4: Observed betweenness centralities, expected betweenness centralities, and We implemented IICCC in Java and ran it on a computer featuring χ values of each communication channel connecting each two adjacent nodes m an Intel Core i7 processor (2.70 GHz) and 16 GB of RAM running and ¢ that are part of a significant communication path p originating from node 6 Windows 10 Pro. We compared IICCC experimentally with five in our running network. The values were computed based on the hierarchical level competing systems: Locality Sensitive Hashing (LSH) [40], of and within p. m m¢ SIIMCO [45], ECLfinder [41], CrimeNet Explorer [21], and LogAnalysis [13]. Each of these systems is briefly described below. LSH [40] employs locality sensitive hashing techniques for identifying the important edges in a network. [40] used this technique for identifying energy-efficient paths. It applied energy- efficient for low importance edges and full fidelity for high importance edges. SIIMCO [45] and ECLfinder [41] are systems that we previously proposed as tools to identify the leaders of a criminal organization based on MCD. The key difference between IICCC and these systems is that IICCC adopts the significant communication path and critical communication channel concepts, whereas ECLfinder applies the existence dependency concept, and SIIMCO uses Fisher’s exact test to assign a score to each node c c reflecting its relative importance in the network. CrimeNet N N s , s , 2´´ Precision Recall Recall= Precision= F- value= top top Explorer [21] uses Blockmodeling [5] and the shortest path N N Precision + Recall m s algorithm to compute the relationships between nodes. It uses the where N is the number of correct nodes predicted by a system, closeness, degree and betweenness centrality metrics to identify the top top leaders of a criminal organization represented by a MCD network. N is the actual number of correct nodes, and N is the number m s Finally, LogAnalysis [13] computes the edge betweenness for a of nodes predicted by a system. Let L and L be the lists predicted top s network using the Girvan & Newman algorithm [15], and then uses by a network metric and a system, respectively, ranked based on the greedy algorithm [33] to cluster the network hierarchically. The top their influences. N Ltop and N = | Ltop |. s m top clusters confine the important nodes in the network. Accordingly, we computed the Recalls, Precisions, and F- A. Compiling Datasets for Evaluation values of each system with regards to each network centrality The algorithms were evaluated using three real-world metric and each quality metric. Table 6 shows the results using the communications datasets, namely the Caviar dataset [6, 31], the Caviar dataset [6, 31], Table 7 shows the results using the Enron Enron email dataset [12], and the Krebs’s 9/11 dataset [50, 51]. We email dataset [12], and Table 8 shows the results using Krebs’s converted each dataset to a network depicting the communication 9/11 dataset [50, 51]. Fig. 5 shows the overall average Recalls, attempts between the individuals incriminated in the incidents. Precisions, and F-values of the five systems. Each dataset is briefly described below. • Caviar dataset [6, 31] is drug-trafficking operation’s communications among a Canadian gang called Caviar. The Table 6: The quality of the results predicted by each system computed by comparing Caviar gang operated in Montreal, Canada, and dealt with the system’s top-ranked nodes with the corresponding ones predicted by the importing and distributing hashish and cocaine. A network was network metrics using the Caviar dataset [50, 51]. created based on the phone calls among the drug traffickers Recall Precision F-value between the years 1994 and 1996. The network consists of 110 IICCC 0.62 0.66 0.64 ECLfinder 0.56 0.55 0.55 nodes representing 110 gang members. Since the identities of the SIIMCO 0.57 0.60 0.58 gang members have been kept confidential, the members are CrimeNet Explorer 0.55 0.52 0.53 represented in the network by node designations (e.g., N1, N2, LogAnalysis 0.46 0.52 0.49 ….). The dataset is available at [9]. IICCC 0.64 0.53 0.58 ECLfinder 0.46 0.41 0.43 • Enron email dataset [12, 24] is a corpus of email messages SIIMCO 0.59 0.47 0.52 exchanged between top Enron employees and associates. The CrimeNet Explorer 0.52 0.40 0.45 corpus came to light in 2001 following a criminal investigation LogAnalysis 0.48 0.42 0.44 IICCC 0.68 0.74 0.76 about alleged white-collar crime within the Enron Corporation. ECLfinder 0.66 0.71 0.67 Most of these emails revolve around this. The dataset consists of SIIMCO 0.68 0.62 0.65 619,446 messages exchanged between 158 Enron employees. CrimeNet Explorer 0.55 0.51 0.53 After cleaning the data, we obtained 200,136 messages LogAnalysis 0.61 0.58 0.59 IICCC 0.73 0.69 0.71 exchanged between 151 employees. The investigation of Enron ECLfinder 0.68 0.63 0.65 wrongdoing incriminated 28 Enron employees and associates. SIIMCO 0.62 0.57 0.59 The names and identities of these 28 employees have been CrimeNet Explorer 0.54 0.47 0.50 released to the public. In our evaluations, we considered the LogAnalysis 0.54 0.51 0.52 identities of these 28 employees as ground-truth data. The raw corpus is currently available online at [12]. Table 7: The quality of the results predicted by each system computed by comparing • Krebs’s 9/11 dataset [50, 51] is a corpus depicting the interactions the system’s top-ranked nodes with the corresponding ones predicted by the between the terrorists involved in the 9/11 incident. The 9/11 network metrics using the Enron dataset [12]. were a series of four coordinated terrorist attacks on the United Recall Precision F-value States on the morning of September 11, 2001. The Krebs’s 9/11 IICCC 0.57 0.54 0.55 ECLfinder 0.58 0.50 0.54 dataset includes a network depicting interactions between the SIIMCO 0.52 0.46 0.49 individuals incriminated in the terrorist attacks. The network CrimeNet Explorer 0.37 0.30 0.33 contains 62 nodes depicting the terrorists implicated in the plot. LogAnalysis 0.40 0.34 0.37 It contains 153 links (edges) depicting the interactions between IICCC 0.55 0.48 0.51 the terrorists. The average degree of a node is 4.9. We considered ECLfinder 0.44 0.37 0.40 SIIMCO 0.46 0.39 0.42 the lists of the terrorists ranked by Krebs [50] based on the CrimeNet Explorer 0.34 0.26 0.29 Degree, Betweenness, and Closeness centralities of the nodes LogAnalysis 0.44 0.39 0.41 representing them in the network as a ground-truth dataset. IICCC 0.72 0.67 0.69 ECLfinder 0.69 0.67 0.68 SIIMCO 0.64 0.61 0.62 B. Evaluating the Accuracy of Detecting Influential Nodes CrimeNet Explorer 0.40 0.34 0.37 We compared the influential nodes returned by each system with LogAnalysis 0.58 0.56 0.57 the corresponding ones returned by the standard Betweenness, IICCC 0.69 0.73 0.71 ECLfinder 0.65 0.59 0.62 Closeness, Out Degree, and In Degree centrality metrics [4, 54]. SIIMCO 0.61 0.52 0.56 We compared the results in terms of the following standard quality CrimeNet Explorer 0.49 0.44 0.46 metrics: LogAnalysis 0.45 0.38 0.41 Out In Betweennes Closeness Out In Betweennes Closeness Degree Degree s Centrality Centrality Degree Degree s Centrality Centrality Centrality Centrality Centrality Centrality Table 8: The quality of the results predicted by each system computed by comparing C. Evaluating the Accuracy of Detecting Critical Communication the system’s top-ranked nodes with the corresponding ones predicted by the Channels network metrics using the Krebs’s 9/11 dataset [50, 51]. We compared IICCC with the technique proposed in [40] for Recall Precision F-value identifying the important communication paths (e.g., the critical IICCC 0.77 0.64 0.70 communication channels) in the Caviar dataset [6, 31]. The authors ECLfinder 0.66 0.61 0.63 SIIMCO 0.62 0.55 0.58 of [40] proposed a framework for identifying the important edges CrimeNet Explorer 0.54 0.58 0.56 originating from influential nodes in a graph. To the best of our LogAnalysis 0.51 0.49 0.50 knowledge, the technique proposed in [40] is the closest to our IICCC 0.68 0.73 0.70 work for identifying the important paths originating from the ECLfinder 0.59 0.57 0.58 SIIMCO 0.55 0.50 0.52 influential nodes in a network. ECLfinder, SIIMCO, CrimeNet, CrimeNet Explorer 0.49 0.43 0.46 and LogAnalysis are not designed for identifying important paths. LogAnalysis 0.39 0.43 0.41 [40] employs locality sensitive hashing techniques for identifying IICCC 0.62 0.53 0.57 important edges. We refer to this technique by LSH (Locality ECLfinder 0.66 0.68 0.67 Sensitive Hashing) for easy reference. [40] used the LSH technique SIIMCO 0.64 0.59 0.61 CrimeNet Explorer 0.52 0.46 0.49 for computing energy-efficient by applying energy-efficient for LogAnalysis 0.52 0.54 0.53 low importance edges and full fidelity for high importance edges. IICCC 0.73 0.66 0.69 We used the Caviar dataset [6, 31] described in section VII-A ECLfinder 0.71 0.67 0.69 for the evaluation. The most influential nodes in the Caviar dataset SIIMCO 0.69 0.55 0.61 have been designated by nodes N1, N12, and N3 [10]. The gang CrimeNet Explorer 0.57 0.51 0.54 LogAnalysis 0.66 0.61 0.63 member represented by node N1 was heading the hashish drug trafficking. The gang member represented by node N12 was heading the cocaine drug trafficking. The gang member representing by node N3 was the intermediary between the N1 and IICCC ECLfinder SIIMCO N12 as well as between the two and non-traffickers [10]. Therefore, we considered the paths originating from nodes N1, N12, and N3 CrimeNet Explorer LogAnalysis as ground-truth critical communication channels. We used the detection accuracy (Acc) formula shown in Equation 6 as a metric for the evaluation. Fig. 7 shows the results. %&'%( (6) 𝐴𝑐𝑐 = %&'%(')&')( • TP (True positive): Number of paths correctly predicted as non-critical. • FP (False positive): Number of paths incorrectly predicted as non-critical. • TN (True negative): Number of paths correctly predicted as critical • FN (False negative): Number of paths incorrectly predicted as critical. Fig. 5: The overall average Recall, Precision, and F-value of the five systems. IICCC LSH We computed the overall average execution time of IICCC and the other four methods for identifying the influential nodes in the thee ground-truth datasets described in section VII-A. Fig. 6 shows the results. As the figure shows, the computation time of IICCC is acceptable, and it is outperformed by only five methods. Fig. 7: The detection accuracy of IICCC and LSH [40] for identifying the critical communication channels in the Caviar dataset [6, 31]. D. Comparing the Systems in terms of Euclidean Distances In this test we aim at measuring the degree of conformation between the predictions made by IICCC and the corresponding predictions made by the standard network metrics. That is, we evaluated the accuracy of IICCC in terms of the distances between: Fig. 6: Overall average execution time for identifying the influential nodes in the (1) the position of each node m in a list ranked by IICCC according three ground-truth datasets described in section VII-A Out In Betweennes Closeness Degree Degree s Centrality Centrality Centralit Centrality to the influences of nodes in the network, and (2) the position of a) IICCC returned eight Enron employees as the most the same node m in a list ranked by a standard network metric influential ones in the organization. Five out of these according to the influences of nodes in the network. A ranking of eight employees are publicly known to be the most nodes is a permutation of the integers 1, 2, .... Intuitively, the involved ones in the crime, according to the investigation smaller the distance the better IICCC. Especially, we are interested and the sentencing records that have been released to in measuring the distances between the positions of IICCC’s top- public domain. They were charged and found guilty of ranked nodes and the positions of the same nodes in the lists ranked various conspiracy and accounting frauds. These five by the standard network metrics. Towards this, we computed the employees are: Arthur Andersen (auditor), Andrew average Euclidean distances between the positions of the top n Fastow (financial officer), Kenneth Lay (CEO), Rick nodes in the lists ranked by IICCC, and the corresponding positions Causey (Chief Accounting Officer), and Jeffrey Skilling of the same nodes in the lists ranked by the three-network metrics, (COO). On the other hand, ECLfinder and SIIMCO where n is considered to be 5, 10, or 15. We employed the identified correctly four of them, LogAnalysis identified Euclidean measure shown in Equation 7. Fig. 8 plots the average correctly three of them, and CrimeNet Explorer Euclidean distance for each system using the three datasets identified correctly only one of them. described in subsection VII-A. b) IICCC determined the influence of each node in the evaluation networks by considering not only the node’s (7) dv (, ss )= |s ( )-s (v)| number of links, but also the relative influences of the ms å m s top xN Î nodes connected to the node (using the concept of top where is the list of the top n nodes predicted by network significant communication paths). IICCC also top considered the relative influences of the edges connected || N metricm , is the list of the ranked top n nodes s Î[0,1] to the node (using the concept of critical communication top || N predicted by network metric m , is the list of the s Î[0,1] channels). c) CrimeNet Explorer assigned a weight for each node ranked top n nodes predicted by a system , and and s (v) based on its topology in the evaluation networks with top are the positions in the lists and of node Î . s (v) s s N regard to the node n under consideration. Therefore, s m s m these nodes did not contribute equally to the influence of n, a phenomenon described as incomplete contribution. IICCC ECLfinder SIIMCO This is one of the key limitations of CrimeNet Explorer. CrimeNet Explorer LogAnalysis d) When identifying the influences of the nodes in the evaluation networks, LogAnalysis considered only the weights of the edges connected to these nodes. This is one of the key limitations of LogAnalysis. e) SIIMCO and generated a much larger number of correctly predicted influences of nodes in the parts of the evaluation networks that have less dense connections. That is, they did not perform as well in dense parts. This is one of the key limitations of SIIMCO and ECLfinder. Comparing IICCC with the real-world drug trafficking operation that identified the Caviar Canadian gang [6, 31] can give a Fig. 8: The average Euclidean distance for each system using the three datasets described in subsection VII-A. supporting evidence of the possibility of applying the IICCC method to solve real-world practical problems. The similarities E. Discussion of the Results between IICCC and the Caviar operation can be summarized as follows: As shown in Fig. 7 that IICCC outperformed LSH [40] in • The Caviar operation constructed the network that depicts terms of Detection Accuracy of the critical communication channels in the Caviar dataset [6, 31]. Based on our observation of the drug traffickers based on the phone calls exchanged between them. Similarly, IICCC constructs a network the experimental results, we attribute this outperformance to the combination of the “significant communication paths” and “critical depicting a criminal organization based on the phone calls exchanged between the criminals in the organization. communication channels” concepts employed by IICCC. The important paths identified by the “locality sensitive hashing • The Caviar operation identified the influential criminals technique” employed by [40] is equivalent to the “significant in the network (i.e., the criminals represented by nodes communication paths” identified by IICCC. However, the N1, N12, and N3) in order to monitor them and to gain performance of IICCC over [40] stems from IICCC’s employment insight into the traffickers’ work. Similarly, IICCC of the “critical communication channels” concept, which shorts list identifies the influential criminals in a criminal the “significant communication paths” based on their importance organization in order to monitor them and to short-list the in networks. important communication channels originating from them. This monitoring can give an insight into the As Tables 6-8 and Figs. 5 and 8 show, IICCC outperformed organization’s work and the identity of a large number of SIIMCO, ECLfinder, CrimeNet Explorer, and LogAnalysis for the insiders and outsiders who deal with the organization. identifying the influential nodes in the datasets. After studying the This will eventually lead to the arrest of these people, experimental results, we attributed the superior performance of which will most likely result in at least crippling the IICCC to the following factors: organization’s work. VIII. CONCLUSION (e.g., an influential path depicts a route with high human mobility). Many techniques have been proposed to determine the relative Ø In urban planning, it can be used for identifying congested importance of nodes. However, most these techniques do not roads [53] (e.g., an influential path depicts a congested generate useful short-lists of critical communication channels in a road). criminal organization. We have developed a forensics system called IICCC that can (1) infer the top influential criminals in a criminal organization, and (2) short-list the vital communication REFERENCES channels in the criminal organization. IICCC can create a network [1] A. Milani Fard, M. Ester, “Collaborative Mining in Multiple Social from either MCD or crime incident reports. IICCC employs the Networks Data for Criminal Group Discovery”, IEEE International following concepts: (1) “significant communication paths”, Conference on Social Computing (SocialCom), 2009. denoting sequences of influential criminals who propagate [2] Baker, W. E. and Faulkner, R. R. 1993. “The social organization of information diffused by the criminal under investigation to other conspiracy: Illegal networks in the heavy electrical equipment industry”. Amer. Soc. Rev. 58, 837–860. criminals, and (2) “critical communication channels”, which are [3] Baldi, P. and Hatfield, W. (2002), “DNA Microarrays and Gene sub-paths linking two adjacent influential criminals in the network Expression”, Cambridge University Press, Cambridge, UK. and passing vital information from the top influential criminals to [4] Borgatti, Stephen P. (2005). "Centrality and Network Flow". Social others lower in the hierarchy. IICCC identifies the influential Networks. Elsevier. 27: 55–71. criminals and significant communication paths by computing the [5] Breiger, R. L., Boorman, S. A., and Arabie, P. 1975. “An algorithm for χ value of each node in a significant communication path. clustering relational data, with applications to social network analysis and comparison with multidimensional scaling". J. Math. Psych. 12, 328–383. We compared IICCC experimentally with LSH [40], [6] Bahulkar, A., Szymanski, B., Baycik, N., and Sharkey, T. “Community SIIMCO [45], ECLfinder [41], CrimeNet Explorer [21], and Detection with Edge Augmentation in Criminal Networks”. 2018 LogAnalysis [13]. For the evaluations, we used the following IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Spain, 2018. datasets: Caviar dataset [6, 31], Enron email dataset [12, 24], and [7] Chen, H. and Lynch, K. J. 1992. “Automatic construction of networks of Krebs’s 9/11 dataset [50, 51]. The experimental results showed that concepts characterizing document databases”. IEEE Trans. Syst. Man IICCC has a superior prediction performance. Cybernet. 22, 885–902. [8] Chen, H., Zeng, D., Atabakhsh, H., Wyzga, W., and Schroeder, J. 2003. The key contribution of the paper lies in identifying the vital “Coplink: Managing law enforcement data and knowledge”. Commun. communication channels in a criminal network. That is, the key ACM 46, 28–34. contribution of IICCC is to generate useful short-lists of the critical [9] Caviar dataset. Available at: communication channels in a criminal organization. This is https://sites.google.com/site/ucinetsoftware/datasets/covert- because there are numerous communication channels in criminal networks/caviar [10] C. Morselli and K. Petit “Law-Enforcement Disruption of a Drug networks, and investigating all the channels is time-consuming and Importation Network”, Global Crime, vol. 8, Issue 2, 2007. distracting, when the investigators would prefer to focus on the key [11] Díaz, C., Patacchini, E., Verdier, T., Zenou, Y. “The influence of leaders channels that reveal the most information. on criminal decisions”. VOX, CEPR’s Policy Portal, 2018. The experimental results confirmed the usefulness of [12] Enron Email Dataset. Available at: http://www-2.cs.cmu.edu/~enron/. [13] E. Ferrara, P. De Meo, S. Catanese, and G. Fiumara, “Detecting criminal shortlisting critical communication channels. For example, in the organizations in mobile phone networks,” Expert Systems with Caviar dataset used in our experiments there are 86, 138, and 227 Applications, vol. 41, no. 13, pp. 5733–5750, 2014. communication channels connected directly and indirectly to the [14] Freeman, L. C. (1977). “A set of measures of centrality based on most influential nodes designated by nodes N1, N12, and N3 betweenness”. Sociometry, 40(1), 35-41. respectively. Out of these channels, IICCC identified only 9, 5, and [15] Girvan, M., Newman, M. (2002). “Community structure in social and 14 as critical communication channels connected to N1, N12, and biological networks”. Proceedings of the National Academy of Sciences, 99(12), 7821. N3 respectively. [16] Glsser, “Estimating Possible Criminal Organizations from Co-offending Even though the paper focusses on identifying the important Data”. Public Safety Canada, 2012. communications paths in criminal networks, the framework of [17] Greenwood, P. E., Nikulin, M.S. (1996), “A guide to chi-squared testing”. Wiley, New York. ISBN 0-471-55779-X IICCC can also be used to solve many other practical real-world [18] H. Sarvari, E. Abozinadah, A. Mbaziira, and D. McCoy, “Constructing problems depicted in the form of networks, such as the following: and analyzing criminal networks”, Proc. IWCC, USA, 2014, pp. 84–91. Ø In social networks, it can be used for detecting [19] H. Wang, C. K. Chang, H.-I. Yang, and Y. Chen, “Estimating the relative communities. A number of studies successfully identified importance of nodes in social networks,” Journal of Information Processing, vol. 21, no. 3, pp. 414–422, 2013. the boundaries of communities based on the strength of [20] J. Pattillo, N. Youssef, and S. Butenko, “Clique relaxation models in the information flow in the paths/edges connecting their social network analysis,” in Handbook of Optimization in Complex nodes [43]. Networks. Springer, 2012, pp. 143–162. Ø In metabolic networks, it can be used for identifying [21] J. J. Xu and H. Chen, “CrimeNet explorer: A framework for criminal functionally-related units. A fully described path between network knowledge discovery,” ACM Trans. Inf. Syst., vol. 23, no. 2, pp. two units represents the dynamics and dependencies 201–226, Apr. 2005. among them [46]. [22] Klerks P., “The Network Paradigm Applied to Criminal Organisations: Theoretical nitpicking or a relevant doctrine for investigators? Recent Ø In energy conservation computation, it can be used for developments in the Netherlands”, Connections 24(3): 53-65, 2001. identifying energy-efficient paths. A path in an energy [23] Kathleen C., Matthew D., Max T., Jeffrey R., Natasha K. “Destabilizing network that passes energy with high flow rate is likely to Dynamic Covert Networks”. Proc. ICCRTS 2003, Washington DC, USA. utilize energy effectively [38]. [24] Keila, S. and D. Skillicorn (2005), “Structure in the Enron email dataset”, Ø In health care, it can be used for quantifying the impact of Computational & Mathematical Organization Theory, 11(3), 183–99. human mobility in spreading infectious diseases [53] [25] L. Langohr, “Methods for finding interesting vertices in weighted [54] Wasserman, S. and Faust, K. (1994). “Social Network Analysis: Methods graphs,” Ph.D. dissertation, 2014. and Applications”. Cambridge University Press. [26] L. C. Freeman, “Centrality in social networks conceptual clarification,” Social networks, vol. 1, no. 3, pp. 215–239, 1979. [55] Yang, L. “Based on social network crime organization relation mining and central figure determining”. 2012 IEEE International Conference on [27] Leeson, Peter T., and Douglas B. Rogers. 2012. “Organizing Crime.” Computer Science and Automation Engineering, South Korea, June 2012. Supreme Court Economic Review 20: 89-123. [28] Mcandrew, D. 1999. The structural analysis of criminal networks. In The Social Psychology of Crime: Groups, Teams, and Networks. D. Canter and L. Alison, Eds. Dartmouth Publishing, Aldershot, UK, 53–94. [29] M. Akbas, R. Avula, M. Bassiouni, and D. Turgut, “Social network generation and friend ranking based on mobile phone data” IEEE International Conference on Communications, Budapest, Hungary, 2013. Kamal Taha is an Associate Professor in the [30] Memon, Bisharat, “Identifying Important Nodes in Weighted Covert Department of Electrical and Computer Networks Using Generalized Centrality Measures”. 2012 European Engineering at Khalifa University, UAE, since Intelligence and Security Informatics Conference, Odense, Denmark. [31] Morselli, Carlo, & Cynthia Giguere, “Legitimate strengths in criminal 2010. He received his Ph.D. in Computer Science networks”, Crime, Law and Social Change 45(3), 2006, 185-200. from the University of Texas at Arlington, USA. [32] Mansour, Abdala, Nicolas Marceau, and Steve Mongrain, “Gangs and He has over 80 refereed publications that have Crime Deterrence,” Journal of Law, Economics & Organization, Oct appeared in prestigious top ranked journals, 2006, Vol. 22 Issue 2, p 315-339. conference proceedings, and book chapters. Over 20 of his [33] Newman, M. (2004). “Fast algorithm for detecting community structure publications have appeared in IEEE Transactions journals. He was in networks”. Physical Review E, 69(6), 066133. as an Instructor of Computer Science at the University of Texas at [34] Nikulin, M.S. (1973). "Chi-squared test for normality". In: Proceedings of the International Vilnius Conference on Probability Theory and Arlington, USA, from August 2008 to August 2010. He worked as Mathematical Statistics, v.2, pp. 119–122. Engineering Specialist for Seagate Technology, USA, from 1996 [35] Phil W., “Transnational Criminal Networks”. In Networks and Netwars: to 2005 (Seagate is a leading computer disc drive manufacturer in The Future of Terror, Crime, and Militancy, ed. Arquilla and Ronfeldt. the US). His research interests span bioinformatics, defect [36] P. A. C. Duijn, V. Kashirin, and P. M. A. Sloot, “The relative characterization of semiconductor wafers, Information Forensics & ineffectiveness of criminal network disruption,” Scientific Reports, vol. Security, information retrieval, data mining, and databases, with an 4, article 4238, 2014. emphasis on making data retrieval and exploration in emerging [37] P Dey, S Roy. “Centrality based information blocking and influence minimization in online social network”. 2017 IEEE ANTS, India. applications more effective, efficient, and robust. He serves as a [38] P. Plonski, P. Tokekar, V. Isler, “Energy-efficient path planning for solar- member of the Program Committee, editorial board, and review powered mobile robots”. J. of Field Robotics,” Vol.30, pp. 583-601, 2013. panel for a number of international conferences and journals, some [39] Shang, X., Yuan, Y. Social Network Analysis in Multiple Social Networks of which are IEEE and ACM journals. He is a Senior Member of Data for Criminal Group Discovery. Proc. 2012 CyberC, Sanya, China. the IEEE [40] S. M. Faisal, G. Tziantzioulis, A. M. Gok, N. Hardavellas, S. Ogrenci- Memik, and S. Parthasarathy, “Edge importance identification for energy efficient graph processing," in 2015 IEEE Big Data, 2015, pp. 347-354. [41] Taha, K. and Yoo, P. “Using the Spanning Tree of a Criminal Network Paul Yoo is currently with the CSIS within for Identifying its Leaders”. IEEE Transactions on Information Forensics & Security, 2016, 12 (2), pp. 445 - 453. Birkbeck College at the University of London. [42] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction Prior to this, he held academic/research posts in to Algorithms”. MIT Press, Cambridge, MA, 2nd edition (2001). Cranfield (Defence Academy of the UK), Sydney [43] Taha, K. “Disjoint Community Detection in Networks based on the (USyd), South Korea (KAIST) and the UAE Relative Association of Members”. IEEE Transactions on Computational (Khalifa). In his short career, he has amassed more Social Systems, 2018, Vol. 5, issue 2. than 60 prestigious journal and conference [44] Tversky A: Features of Similarity. Psycholog. Rev 1977, 84:327-352 publications, has been awarded more than US$ 2.3 [45] Taha, K. and Yoo, P. “SIIMCO: A Forensic Investigation Tool for Identifying the Influential Members of a Criminal Organization”. IEEE million in project funding, and a number of prestigious Transactions on Information Forensics & Security, 2015, Vol. 11, issue international and national awards for my work in advanced data 4, pp. 811 – 822. analytics, machine learning and secure systems research, notably [46] Taha, K. “Inferring the Functions of Proteins from the Interrelationships IEEE Outstanding Leadership Award, Capital Markets CRC between Functional Categories”. IEEE/ACM Transactions on Award, Emirates Foundation Research Award, and the ICT Fund Computational Biology and Bioinformatics, 2016, Vol. 15, issue 1, pp. 157-167. Award. Most recently, he won the prestigious Samsung award for [47] U. Brandes, “A faster algorithm for betweenness centrality,” Journal of research to protect IoT devices. He serves as an Editor of IEEE Mathematical Sociology, vol. 25, no. 2, pp. 163–177, 2001. COMML and Journal of Big Data Research (Elsevier). He is also [48] U. Glsser, “Estimating Possible Criminal Organizations from Co- affiliated with the University of Sydney and Korea Advanced offending Data”. Public Safety Canada, 2012. Institute of Science and Technology (KAIST) as a Visiting [49] U. K. Wiil, J. Gniadek, N. Memon; “Measuring Link Importance in Professor. He is a Senior Member of the IEEE. Terrorist Networks. Social Network Analysis”, International Conference On Advances in Social Networks Analysis and Mining, ASONAM 2010. [50] V. E. Krebs, “Uncloaking terrorist networks,” First Monday, vol. 7, pp. 4–11, 2002. [51] V. E. Krebs, “Mapping networks of terrorist cells,” Connections, vol. 24 (3), pp. 43–52, 2002. [52] [Victor E. Kappeler and Gary W. Potter. “The Mythology of Crime and th Criminal Justice”, 5 edition, Waveland Press, Inc., 2017. [53] Vazquez-Prokopec G et al. “Using GPS technology to quantify human mobility, dynamic contacts and infectious disease dynamics in a resource- poor urban environment”. PLoS ONE 8, 2013, e58802.
IEEE Transactions on Information Forensics and Security – Unpaywall
Published: Aug 1, 2019
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.