TY - JOUR
AU - Rahmann,, Sven
AB - Abstract Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of these systems emerge not only from the protein interactions themselves but also from the dependencies between these interactions, as generated by allosteric effects or mutual exclusion due to steric hindrance. Therefore, formal models for integrating and utilizing information about interaction dependencies are of high interest. Here, we describe an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining constrained protein interaction networks (“constrained networks”). The construction of these networks is based on public interaction databases as well as text-mined information about interaction dependencies. We present an efficient data structure and algorithm to simulate protein complex formation in constrained networks. The efficiency of the model allows fast simulation and facilitates the analysis of many proteins in large networks. In addition, this approach enables the simulation of perturbation effects, such as knockout of single or multiple proteins and changes of protein concentrations. We illustrate how our model can be used to analyze a constrained human adhesome protein network, which is responsible for the formation of diverse and dynamic cell–matrix adhesion sites. By comparing protein complex formation under known interaction dependencies versus without dependencies, we investigate how these dependencies shape the resulting repertoire of protein complexes. Furthermore, our model enables investigating how the interplay of network topology with interaction dependencies influences the propagation of perturbation effects across a large biochemical system. Our simulation software CPINSim (for Constrained Protein Interaction Network Simulator) is available under the MIT license at http://github.com/BiancaStoecker/cpinsim and as a Bioconda package (https://bioconda.github.io). Insight, innovation, integration Protein interactions enable cellular biochemical reaction systems and thereby cellular functions. Not only the protein interactions themselves, but also the dependencies between these, e.g., allosteric effects or mutual exclusions, contribute to complexity and functionality. We present an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining “constrained protein interaction networks”. Further, we present an efficient data structure and algorithm to simulate protein complex formation. This allows a fast simulation, the analysis of large networks and simulation of perturbation effects (knockout or overexpression of proteins, protein concentration changes). We find that the interaction dependencies limit the resulting complex sizes. We demonstrate how the interplay of network topology and interaction dependencies influences the propagation of perturbation effects. Introduction A central goal in cell biology and systems biology is to understand how cellular functions emerge from the collective action of interacting proteins. High-throughput protein–protein interaction detection techniques, such as yeast two-hybrid and mass spectrometry,1–3 can provide static snapshots of complete interactomes, as demonstrated with several organisms.4,5 In addition, computational methods for interaction detection have been proposed.6–9 The obtained interaction information is typically modeled as networks, i.e. undirected graphs with nodes and edges corresponding to the proteins and their pairwise physical interactions, respectively.10,11 However, this approach neglects a fundamental feature of protein networks: interactions between proteins are dependent on each other. A key mechanism generating interaction dependencies is allosteric regulation, in which a protein undergoes conformational change upon one interaction which affects its capability to bind other proteins.12 Another major mechanism for interaction dependencies is mutual exclusiveness arising from steric hindrance that prevents proteins from binding simultaneously to too-close, overlapping, or identical protein domains (Fig. 1). Fig. 1 View largeDownload slide Interaction dependencies confine the set of protein interactions which are simultaneously possible. Left: two or more proteins (here, FYN and SRC) can compete on the same binding domain on a host protein (CBL), leading to the constraints {(CBL, d3), (SRC, d2)} ⇒ ¬{(CBL, d3), (FYN, d1)} and {(CBL, d3), (FYN, d1)} ⇒ ¬{(CBL, d3), (SRC, d2)}. Right: a given interaction (here, between CBL and SRC) can depend on another, allosteric, interaction (between CBL and ABL1) that induces a conformational change in the host protein (CBL). This is represented by the constraint {(CBL, d3), (SRC, d2)} ⇒ {(CBL, d4), (ABL, d5)}. Fig. 1 View largeDownload slide Interaction dependencies confine the set of protein interactions which are simultaneously possible. Left: two or more proteins (here, FYN and SRC) can compete on the same binding domain on a host protein (CBL), leading to the constraints {(CBL, d3), (SRC, d2)} ⇒ ¬{(CBL, d3), (FYN, d1)} and {(CBL, d3), (FYN, d1)} ⇒ ¬{(CBL, d3), (SRC, d2)}. Right: a given interaction (here, between CBL and SRC) can depend on another, allosteric, interaction (between CBL and ABL1) that induces a conformational change in the host protein (CBL). This is represented by the constraint {(CBL, d3), (SRC, d2)} ⇒ {(CBL, d4), (ABL, d5)}. The dependencies between interactions have profound impact on the properties of a protein network, as they constrain the set of possible protein complexes and their possible paths to assemble.13–16 Moreover, interaction dependencies enable perturbations of one interaction to propagate along the network and affect other interactions.15,17–19 Therefore, considering the dependencies between protein interactions and inferring their collective impact is essential for understanding the design and function of intracellular protein networks. An example of a bioinformatics application which benefits directly from the incorporation of dependencies is the prediction of protein complexes. Various approaches for inferring protein complexes in silico at large scales were developed and applied.20 These approaches usually rely on detecting dense regions within the plain protein network via clustering algorithms.21–24 Several studies demonstrated, as can be expected, that considering mutual exclusiveness between interactions improves the quality of such protein complex prediction.25–28 However, so far no approach appears to exist that takes arbitrary types of dependencies into account. Ultimately, a complete quantitative biochemical description of the whole biochemical system, including the concentrations and spatial distribution of all involved proteins and the kinetic constants of their interactions is desirable.29–31 However, despite the progress in technologies for measuring these parameters in living cells, a complete description of large intracellular biochemical systems is still beyond reach. Moreover, even given a detailed description of such a complex system, insightful simulations and modeling remain challenging with current computational technology.31,32 Therefore this approach is fundamentally difficult even for small protein networks.33,34 A valuable simplification of this challenge can be achieved based on the observation that mutual exclusiveness and allosteric regulations typically lead to all-or-none changes in the state of the target protein interaction, and therefore can be viewed as boolean logic dependencies between protein interactions. Accordingly, logic-based models have been successfully used for the analysis of signaling networks.35 In comparison to finding interactions between proteins, identifying the dependencies between these interactions is more challenging. In order to infer mutual exclusiveness between the binding of two proteins to a third one, their minimal binding domains should be identified and found to be at least partially overlapping. However, in case of non-overlapping binding sites which are in close proximity, structural information has to be considered in order to determine if there is a steric hindrance between the binding proteins.17,18,36,37 Similarly, structural comparisons between proteins that are known to interact with a common protein enable to infer probabilities for mutually exclusive interactions in a protein network.38 Advances in computational protein–protein docking enable to infer protein interactions, and hence with sufficient structural resolution it can also indicate competing interactions throughout a protein network.39–43 In addition to the experimental and computational challenges to identify dependencies between protein interactions, the knowledge that has been accumulated so far about such interaction dependencies is less standardized and centralized, in comparison to protein interactions. While partial information about interaction dependencies is available in databases, a considerable amount of experimental findings which indicate interaction dependencies are only textually described in scientific publications, rather than standardized for mining. Along this line, we previously established a computational approach for high-throughput mining of protein interaction dependencies from large text corpora.15 Yet, while all of the aforementioned methods gradually lead to accumulation of organized information about interaction dependencies in large biochemical systems, a comprehensive approach to integrate this knowledge for getting a better understanding of large biochemical systems is still required. Contributions So far, no unifying model appears to exist that takes arbitrary types of protein interaction dependencies (beyond mutual exclusiveness) into account. Additionally, previous work rarely considered the concentration of proteins, although it can, in particular combined with interaction dependencies, have a significant impact on the possible complexes. Here we propose a framework and a simulation method for the evaluation of complex assembly on a large scale, for hundreds of proteins with thousands of copies. We use propositional logic to model interaction dependencies, and provide a flexible framework for their system-wide representation that we call a constrained protein interaction network (more specifically, a constrained protein domain–domain interaction network, or just constrained network for short). We present a computational approach to simulate constrained networks for studying steady state and response to perturbations (knockout and overexpression of single or multiple proteins, changes of protein concentrations). We show how this framework enables a fast simulation and the analysis of many proteins in large networks. We then illustrate the benefits of our model on the human adhesome protein network,44–47 with adjusted simulation parameters to match properties of known human protein complexes. The components of the integrin adhesome network are designed to interact with each other in alternative ways to form diverse types of cell–matrix adhesion sites.48 Hence the integrin adhesome provides a good model system, exemplifying the importance of system-level integration of dependencies between protein interactions for understanding self-organization processes.15,45,47 By comparing complex formation with known dependencies against complex formation without dependencies, we show that interaction dependencies limit the resulting complex sizes and influence the fraction of singleton proteins of each type. We illustrate how our model enables us to study the effects of perturbations like knockout or overexpression of proteins. Thereby, we show how the interplay of network topology and interaction dependencies guides the propagation of perturbation effects across the network. To allow others to investigate these effects, we offer our simulation software CPINSim (Constrained Protein Interaction Network Simulator) under the MIT license at http://github.com/BiancaStoecker/cpinsim and as a Bioconda package (https://bioconda.github.io). Methods Constrained protein interaction networks: model A protein–protein interaction network may be formalized as an undirected graph (P,I) with a vertex p ∈ P for each protein and an undirected edge {p,p′} ∈ I for each possible interaction. In this sense, the graph describes all potential interactions, not a concrete state of interacting proteins. Sometimes there are several possible interactions between two proteins, which can be distinguished by different binding domains. Therefore, a more fine-grained model is helpful that considers interactions between domains of proteins. Definition 1 (Domain interaction network). A protein domain is a pair (p,d) consisting of a protein name and a domain name. Two protein domains (p1,d1) and (p2,d2) belong to the same protein if p1 = p2. A domain interaction network is an undirected graph (P,I) whose nodes are protein domains (p,d) ∈ P and whose edges are domain interactions {(p1,d1),(p2,d2)} ∈ I. A domain interaction network (P,I) can be projected down to a protein interaction network (P′,I′) by defining P′ := {p|(p,d) ∈ P} and I′ := {{p1,p2}|{(p1,d1),(p2,d2)} ∈ I}. We now present a method for incorporating dependencies between domain interactions. Our method is based on propositional logic.49 Definition 2 (Propositional logic). The propositional logic ℬcop(Q) over a set Q (the atomic units of the logic) is the smallest set of formulas such that • ⊤ (True) and ⊥ (False) are formulas. • q itself is a formula for all q ∈ Q, • if ϕ, ϕ′ are formulas, so are ¬ϕ, ϕ ∧ ϕ′, ϕ ∨ ϕ′, and ϕ ⇒ ϕ′. (The operators ¬, ∧, ∨, ⇒ have the usual semantics “not”, “and”, “or”, and “implies”, respectively. The implication ϕ ⇒ ϕ′ is equivalent to ¬ϕ ∨ ϕ′.) In our application, the atomic units of the logic are the interactions I. Thereby, the satisfiability of an interaction i ∈ I represents whether it is possible or not in a given state (e.g., in partially assembled complexes). We describe interaction dependencies via propositional logic formulas with a particular structure (“constraints”). Definition 3 (Constraint for an interaction dependency). A constraint is a propositional logic formula of the form i ⇒ ψ with i ∈ I and Ψ∈ℬcop(I). With ℭ(I)⊆ℬcop(I) we denote the set of all possible constraints over I. A constraint i ⇒ ψ restricts the satisfiability of i by the satisfiability of ψ. In other words: formula ψ is a necessary condition for interaction i. For example, the dependency of an interaction i on an allosteric effect due to a scaffold interaction j can be formulated by the constraint i ⇒ j. Mutual exclusiveness of two interactions i,j ∈ I can be modeled by the two (equivalent) constraints i ⇒ ¬j and j ⇒ ¬i. Fig. 1 shows some examples graphically. Using propositional logic also allows defining constraints of higher order: an interaction i could depend on an arbitrary scaffold interaction of a given set j1,…,jn, which is modeled by the formula i ⇒ (j1∨⋯∨jn). For example the interaction of F-ACTIN with VCL becomes possible by either ACTN1 or TLN1 binding to VCL. This leads to the constraint {VCL,F-ACTIN}⇒({VCL,ACTIN1}∨{VCL,TLN1}). 1 Protein domains have been omitted for readability. By combining multiple constraints, it is possible to model arbitrary combinations of allosteric effects and steric hindrance. Now, we can define constrained protein interaction networks as a set of protein domains (nodes) connected by interactions (edges) extended by a set of constraints (dependencies between edges). Definition 4 (Constrained protein domain–domain interaction network). Let (P,I) be a domain interaction network. Let C⊆ℭ(I) be a set of constraints according to Definition 3. Then the triple (P,I,C) is called a constrained protein domain–domain interaction network, or constrained network for short. Simulation of protein complex formation A constrained network allows us to approximate the behavior of real proteins in a cell via simulations. For a constrained network (P,I,C), we consider np copies of each protein p to be present in the system. Together, the domains of these protein copies form a graph. Edges represent currently happening interactions between domains. In addition, we consider all domains of the same protein to be implicitly connected. Hence, the connected components of the graph represent protein complexes. Initially, each complex is a singleton protein: there are no interactions. We abstract from the spatial location of the proteins, and perform our simulation stepwise by repeatedly conducting two phases. In the association phase, each protein copy can (randomly) form new associations according to the current state, the possible interactions and the interaction constraints. In the dissociation phase, existing interactions probabilistically dissociate, potentially breaking large complexes into smaller ones. These phases are repeated until certain observable quantities reach stable levels (“convergence”, see below). In the association phase, we iterate over all protein copies. For each copy, with a given association probability α, a new interaction is attempted (with the complementary probability 1 − α, the protein copy will do nothing in this phase). For protein copy p we have ∑p′:{(p,d),(p′,d′)}∈Inp′ different possible interactors to choose from. To attempt a new interaction, first an interactor p′ and then a specific domain interaction i = {(p,d)}, {(p′,d′)} are randomly chosen from the possible interactions not yet established with p. It is then checked whether the proposed interaction i is valid, i.e., that no constraint will be violated if it is established. For this, we consider the conjunction of i with all present interactions Ip ⊆ I and Ip′ ⊆ I involving protein p or p′ respectively and all constraints of the new interaction i. Consider the propositional logic formula fi:=i∧∧j∈Ipj∧∧k∈Ip′k∧∧c=(i⇒Ψ)∈ℭ(I)c. 2 The interaction i can be formed if and only if fi is satisfiable. Essentially, satisfiability means that none of the constraints c contradicts the conjunction of the new and the existing interactions. Satisfiability of fi is checked (see below for an efficient algorithm), and interaction i is added to the simulation state in the affirmative case. If the proposed interaction is not possible in the current state, it is not added; this leads to an effective rate less than α for new associations. In the dissociation phase, we iterate over each existing interaction and remove it with dissociation probability β. We do not check whether any constraints are violated after removal. This is motivated by the following reasoning. Consider an allosteric activation where proteins A and C can only interact if already an interaction between A and B is present. Assume a state where both interactions exist for specific copies of A, B and C. Now the interaction between A and B may dissociate without removing the interaction between A and C. So while that interaction is necessary for the formation of the interaction A and C, it is not necessary for maintaining it. This simplification is based on the assumption that once the allosteric activator B enabled the binding of protein C to protein B, the bound C locks the conformation of B in the state which is compatible for allowing this interaction. An example for such binding-mediated conformation locking is the binding of Vinculin (VCL) to Talin (TLN1), which depends on the mechanical stretching of Talin and then inhibits Talin refolding after the force is released.50 We conduct the simulation until a steady state has been reached. Informally, this is a state where subsequent simulation steps change neither the total number of interactions (edges) in the simulation network nor the distribution of complex sizes in the network. As a proxy for the size distribution, we consider the fraction of singleton proteins (i.e., the number of non-interacting protein copies, divided by the total number of protein copies in the simulation). As we start with no interactions, during the initial steps, the number of interactions (edges) grows until association and dissociation reach a balance where the number of interactions stabilizes. Formally, we detect this point in step s when the mean number of edges over the last ten steps (s − 9,…,s) is smaller than the previous ten-step mean (steps s − 10,…,s − 1). We then continue the simulation for another s steps to monitor the behavior and ensure that both the number of interactions and the proteins' singleton fractions have stabilized. So the simulation runs for 2s steps when in step s the convergence criterion is first satisfied. Fig. 2 visualizes the process. Fig. 2 View largeDownload slide Schematic visualisation of the simulation procedure. Starting without any occurrence of protein interactions, association and dissociation events alternate until the convergence criterion is first satisfied after s steps. Then the simulation continues for another s steps to validate convergence, and the resulting state of the system is examined. Fig. 2 View largeDownload slide Schematic visualisation of the simulation procedure. Starting without any occurrence of protein interactions, association and dissociation events alternate until the convergence criterion is first satisfied after s steps. Then the simulation continues for another s steps to validate convergence, and the resulting state of the system is examined. An efficient algorithm for checking constraints We now discuss how the decision whether a proposed interaction i = {(p, d), (p′, d′)} does not violate any constraints can be made quickly during the simulation. This is of importance because potentially hundreds of thousands such decisions must be made during a single simulation run. Recall that we need to evaluate whether the formula fi given by (2) is satisfiable, where Ip,Ip′ in (2) are the sets of existing interactions involving the same protein copies p,p′ as in i. Since i and all active interactions j and k have to be present, we can omit the first half of the formula and simplify the last part to ∧c=(i⇒Ψ)∈CΨ ⁠. Note that most of the ψ will consist of a single literal (e.g., a negated interaction in case of mutual exclusion). Only in the case of higher order constraints (see eqn (1)), a disjunction remains after the simplification. These should be rare in practice. Now, we precompute the equivalent disjunctive normal form (DNF). A logic formula is in disjunctive normal form if and only if it is a disjunction of clauses, where each clause is a conjunction of one or more literals.49 In other words, we transform the constraints into the form (ℓ1,1∧ℓ1,2∧…∧ℓ1,n1)∨…∨(ℓm,1∧…∧ℓm,nm), where each .,. is an interaction j or a negated interaction ¬j. Each clause of the DNF then represents one conjunction of interactions that have to be present or absent (if occurring negated) in order for interaction i to be possible. If a clause evaluates to true when setting the already present interactions Ip and Ip′ to true and all other interactions to false, we know that the formula fi is satisfiable. In theory, the conversion to DNF could lead to an exponential growth of the number of clauses, but as shown above, we expect most constraints to be simple, consisting of a single literal. Hence, the DNF can be calculated as follows. First, all single literals are combined into a conjunction ϕ. Second, for the first disjunction l1 ∨ l2 ∨…, we spawn a conjunction ϕ ∧ li for each literal li and go on recursively with the next disjunction. Once the recursion is completed, we have ∏(i⇒Ψ)∈C|Ψ| clauses where |ψ| is the number of literals in the disjunction ψ. Consider the subnetwork in Fig. 3 with a current simulation state where one copy of protein A is already interacting with a copy of protein D and the questioned interaction is i = {(A,d1),(B,d4)}. The full formula from above is fi:={(A,d1),(B,d4)} ∧{(A,d2),(D,d6)} ∧{(A,d1),(B,d4)}⇒¬{(A,d3),(E,d7)} ∧{(A,d1),(B,d4)}⇒{(A,d2),(C,d5)}∨{(A,d2),(D,d6)}. The two bottom rows represent the constraints and can be transformed into the equivalent DNF {(A,d2),(C,d5)}∧¬{(A,d3),(E,d7)} ∨{(A,d2),(D,d6)}∧¬{(A,d3),(E,d7)}. If one of the clauses is satisfied given the proposed and the existing interactions (first two rows in the formula above), then the proposed interaction is possible and does not violate any constraints. Fig. 3 View largeDownload slide Examples for formulation of interaction constraints and their bit vector representation. Left: A subnetwork for a host protein A with its interaction dependencies. The interaction of A with B has two independent allosteric activators, C and D, which compete for the same domain on A. Additionally, protein E is an allosteric inhibitor for the interaction between A and B. Middle: Formulation of the illustrated interaction constraints. Right: Bit vector representations of the constraints for protein A. Indices are assigned in lexicographical order. Since the interaction with B has two possible interactors, there are two clauses in the DNF and thus two pairs of bit vectors. Fig. 3 View largeDownload slide Examples for formulation of interaction constraints and their bit vector representation. Left: A subnetwork for a host protein A with its interaction dependencies. The interaction of A with B has two independent allosteric activators, C and D, which compete for the same domain on A. Additionally, protein E is an allosteric inhibitor for the interaction between A and B. Middle: Formulation of the illustrated interaction constraints. Right: Bit vector representations of the constraints for protein A. Indices are assigned in lexicographical order. Since the interaction with B has two possible interactors, there are two clauses in the DNF and thus two pairs of bit vectors. For each possible interaction in the system, there is one such DNF which has to be evaluated fast. For this, we propose the following approach. We first observe that each protein has a limited number of possible binding partners (Fig. 6). This limits the size of the DNF clauses. For each protein, we encode the DNFs of the possible interactions using bit vectors. This representation does not change during the simulation and is shared by all copies of a protein. In addition, for each copy p of a protein, we represent the state of currently active interactions Ip ⊆ I in another bit vector. This bit vector is updated whenever p enters or leaves an interaction. For a potential interaction i = {(p,d), (p′,d′)}, the satisfiability of fi can then be efficiently checked by evaluating the bit vector representations of the corresponding DNFs for both p and p′. In the following we present the details of the representation. We enumerate the interactions of a protein in a convenient order and assume that the index kj of an interaction j can be obtained in constant time. Then, for each potential interaction of a protein, we represent each clause of the corresponding DNF by two bit vectors b+ and b−. In bit vector b+, we store the positive literals: we set the kj-th bit to one if interaction j occurs in a positive literal. The bit vector b− stores the negative literals by setting the kj-th bit to one if interaction j occurs in a negative literal. The state of currently present interactions of protein copy p is represented by a bit vector b* with the kj-th bit set to one for each present interaction j ∈ Ip (the same is additionally done with another bit vector for Ip′). Then, the satisfiability of the DNF can be calculated by iterating over the clauses and checking each clause against the status vector. If b+ & b* = b+ and b− & ¬b* = b− (with ¬b* being the bitwise negation of b* and & the bitwise conjunction), we know that one clause evaluates to true. Once the iteration reaches the first satisfiable clause, we can stop, knowing that the DNF is satisfiable. In our example we assign the indices lexicographically and get (0010,1001) and (0100,1001) as bit vectors (b+,b−) for the two clauses in the DNF. In both vector pairs the least significant (rightmost) bit representing the interaction with B is set in the vector b−. This is to ensure that A and B are not already interacting with each other. The other set bit in b− is for the allosteric inhibition of E. The two different set bits in b+ represent the independent allosteric activators C and D. The bit vector for the current state of active interactions is b* = (0100). In the example, we have b+&b*=0100&0010=0000≠b+ for the first clause. In contrast, the second clause is satisfiable with b+&b*=0100&0010=0100=b+ and b−&¬b*=1001&¬0100=1001=b−. Hence, the constraints are not violated and the example interaction is possible. Simulation of perturbations With a constrained network (P,I,C), we may simulate not only the given network, but also its perturbations. Typical perturbations are changes in the levels of specific proteins, for example due to their knockout or overexpression. Recall that our model considers a copy number (or expression) np for each protein p. Assuming that these np copies represent a typical state of the constrained network, we can simulate a perturbation by modifying the expression of a particular protein p with a factor, i.e. np′ := o·np. An overexpression is equivalent to a factor greater than 1, whereas down regulation of a protein corresponds to a factor less than 1. A factor o = 0 describes a complete knockout, where no copies of the protein are left. It is of course possible to combine overexpression or knockout of different proteins in the same simulation run. Construction of constrained protein interaction networks Often, it is of interest to study a certain subnetwork, that is characterized by a set of proteins P0. At the boundaries of such a subnetwork, there will be interactions that are constrained by proteins that are not part of P0. Not considering such constraints would lead to biased results. In the following, we provide a solution that includes outside proteins such that these constraints are considered as well. Given an initial set P0 of proteins, protein domain interactions I0 (that may involve additional proteins not contained in P0) and a set C0 of constraints over I0, we construct a constrained network (P,I,C) as follows. First, note that a non-trivial constraint c = (i ⇒ ψ) involves at least three proteins, two in interaction i and at least an additional one in ψ. Let P(c) be the (multi)set of proteins mentioned in constraint c. (1) Select all proteins from P0. (2) Select the subset C1 ⊂ C0 of constraints that mentions at least two proteins from the initial protein set P0, i.e. C1 := {c:|P(c) ∩ P0| ≥ 2}. (3) Select all proteins mentioned in C1; i.e., define P1 := ∪c∈C1P(c). (4) To extend the currently selected protein set, consider constraints C2 ⊂ C0 that have an influence on the previous selected proteins; i.e. C2 := {c:|P(c) ∩ (P0 ∪ P1)| ≥ 2} and select proteins P2 := ∪c∈C2P(c). (5) Define proteins P := P0 ∪ P1 ∪ P2, interactions I := {{(p1,d1),(p2,d2)}|(pi,di) ∈ P,{(p1,d1),(p2,d2)} ∈ I0}, and constraints C := C1 ∪ C2. An example for this procedure, based on an initial small subset of the human adhesome network, is shown in Fig. 4. Fig. 4 View largeDownload slide An illustrative example of the construction process of a constrained protein-interaction-network. Starting with an initial protein set P0, we seek to find a minimal network encapsulating P0 while not discarding interesting constraints. (A) A section of the complete constrained network. The diamond-shaped nodes are proteins from the set P0 (human adhesome network), the oval-shaped nodes are proteins not from P0. Black lines indicate interactions and red arcs indicate constraints between interactions. (B) Selection (blue) of all constraints and corresponding proteins and interactions, where at least two proteins are from P0. (C) Selection (green) of proteins whose constraints have an influence on the previously selected proteins. (D) The resulted constrained protein-interaction-network for the initial protein set P0. Fig. 4 View largeDownload slide An illustrative example of the construction process of a constrained protein-interaction-network. Starting with an initial protein set P0, we seek to find a minimal network encapsulating P0 while not discarding interesting constraints. (A) A section of the complete constrained network. The diamond-shaped nodes are proteins from the set P0 (human adhesome network), the oval-shaped nodes are proteins not from P0. Black lines indicate interactions and red arcs indicate constraints between interactions. (B) Selection (blue) of all constraints and corresponding proteins and interactions, where at least two proteins are from P0. (C) Selection (green) of proteins whose constraints have an influence on the previously selected proteins. (D) The resulted constrained protein-interaction-network for the initial protein set P0. It remains to discuss how and where to obtain information on interactions and interaction dependencies, I0 and C0 in the terminology above. While protein interactions are available from various databases,51–53 information about interaction dependencies is not yet collected systematically for various reasons discussed in the Introduction. In the past, we have had success with a semi-automated text mining approach on the human adhesome network.15 Further, competitions on binding domains can be inferred from domain interaction databases, such as DOMINO.52 In those databases each protein interaction is annotated with a binding domain on each protein, i.e., an interval of positions in the amino acid sequence, such as the interval [540,906]. We assume that two proteins compete for the same domain if the domains of the interactions are overlapping each other. If we have a competition between proteins without domain annotations (e.g., obtained by text mining), but each involved protein has a unique domain involved in interactions in the dataset, we assume that the constraint involves the same domains. If we cannot infer the domain in this way, we create artificial unique domains. For allosteric effects we assume that interactor and activator/inhibitor each bind to a different domain of a host protein, while competitors are all assigned to the same domain of the host. Results We first describe the used adhesome interaction network and some of its statistics, as well as the chosen simulation parameters. Then we present our results regarding the running time and convergence of the simulation. Finally, we discuss the effect of constraints on complex formation and demonstrate the propagation of perturbations in the constrained network in contrast to the unconstrained network. Construction of the constrained extended integrin adhesome network Since not many interaction dependencies are known, we selected a network with a high density of known constraints. In previous work, we discovered 71 interaction dependencies for the human adhesome network by systematically mining a collection of over 50 000 full-text articles,15 where we searched for dependencies with at least one involved protein from the adhesome. Further we inferred competitions on binding domains from the domain interaction database DOMINO.52 We started with the human adhesome proteins as initial set P0 in the construction described above. In this initial network there are 121 proteins and 392 interactions (between only these proteins) as well as 139 competitions and 2 allosteric effects (resulting from text mining and DOMINO). The interactions between all selected proteins were taken from the HINT database (only binary interactions53). Applying the described network construction leads to a network with 718 proteins (Table 1 and Fig. 5). Table 1 Characteristics of the constrained protein interaction network used for simulation (human adhesome network): row “initial” refers to the network consisting of only nodes from the adhesome (P0). Row “extended” describes the complete constrained network constructed incrementally from P0, as described in Methods Network Proteins Interactions Competitions Allosteric effects Initial 121 392 139 2 Extended 718 2933 2753 21 Network Proteins Interactions Competitions Allosteric effects Initial 121 392 139 2 Extended 718 2933 2753 21 View Large Table 1 Characteristics of the constrained protein interaction network used for simulation (human adhesome network): row “initial” refers to the network consisting of only nodes from the adhesome (P0). Row “extended” describes the complete constrained network constructed incrementally from P0, as described in Methods Network Proteins Interactions Competitions Allosteric effects Initial 121 392 139 2 Extended 718 2933 2753 21 Network Proteins Interactions Competitions Allosteric effects Initial 121 392 139 2 Extended 718 2933 2753 21 View Large Fig. 5 View largeDownload slide The constructed protein network as used for simulation. The number of interactors is indicated by the color and size of the node. Yellow nodes indicate proteins that have a single interactor, while the number of interactors increases with the size and bluish-level of the node. Key proteins with many interactors are shown by name. For clarity, the dependencies between interactions are not overlaid. Fig. 5 View largeDownload slide The constructed protein network as used for simulation. The number of interactors is indicated by the color and size of the node. Yellow nodes indicate proteins that have a single interactor, while the number of interactors increases with the size and bluish-level of the node. Key proteins with many interactors are shown by name. For clarity, the dependencies between interactions are not overlaid. In the resulting network, 139 proteins have only one domain and one interactor, while most of the proteins have between two and six interactors. Interactors are counted with regard to the domains, meaning that the same protein is counted as two interactors if the interactions are at different domains. The distribution of the number of interactors is shown in Fig. 6 (left). Fig. 6 View largeDownload slide Statistics of the proteins in the constructed constrained network. Domain aware means that interactors and constraints are counted per domain. So if B can interact with A on two different domains of A, then B is counted as two interactors. Left: Frequencies of the number of interactors; there are 17 outliers with more than 35 interactors: 38, 44, 47, 47, 47, 55, 87, 103, 118, 119, 123, 127, 135, 166, 178, 182, 242. Middle: Frequencies of the number of constraints; there are 12 outliers with more than 35 constraints: 42, 72, 73, 95, 102, 103, 106, 114, 122, 153, 166, 168. Right: Histogram of ratios of the number of constraints over the number of interactors. There are 219 proteins with a ratio of 1. Fig. 6 View largeDownload slide Statistics of the proteins in the constructed constrained network. Domain aware means that interactors and constraints are counted per domain. So if B can interact with A on two different domains of A, then B is counted as two interactors. Left: Frequencies of the number of interactors; there are 17 outliers with more than 35 interactors: 38, 44, 47, 47, 47, 55, 87, 103, 118, 119, 123, 127, 135, 166, 178, 182, 242. Middle: Frequencies of the number of constraints; there are 12 outliers with more than 35 constraints: 42, 72, 73, 95, 102, 103, 106, 114, 122, 153, 166, 168. Right: Histogram of ratios of the number of constraints over the number of interactors. There are 219 proteins with a ratio of 1. There are 50 proteins that are not part of a constraint. Most proteins are part of one constraint, while some proteins are part of over 100 constraints. The distribution of the number of constraints is shown in Fig. 6 (middle). Choice of simulation parameters for the extended human adhesome network Given the constrained network, the main parameters to adjust are the association probability α and the dissociation probability β (see Methods, Simulation of protein complex formation). The choice is guided by two criteria. First, the resulting complex size distribution should approximately reproduce known complex size distributions. Especially, we want to avoid the formation of overly large unrealistic complexes (several hundreds to thousands of proteins). Second, as the simulation consists of two discrete steps (association and dissociation), it has to be sufficiently fine-grained that the complex size distributions after association and dissociation phase do not differ significantly in the steady state. This excludes large probabilities. We systematically evaluated different combinations of α and β and found that the reasonable parameter space is restricted to α ≤ 0.1 and β = f·α with a factor f in the interval [2.0,10.0]. In this parameter range, we compared the simulated complex size distribution at steady state with the complex size distribution of known complexes. The known complexes were taken from the CORUM database provided by the Munich Information center for Protein Sequences (MIPS).54 The database contains manually annotated protein complexes from mammalian organisms without regard to the connections of proteins in the complexes or the multiplicity of proteins. Fig. 7 shows a histogram of complex sizes for human complexes in the CORUM database (left) and the complementary cumulative distribution functions (ccdf) of CORUM complexes and simulated complexes for a particular parameter set (α = 0.005, β = 0.0125; right). For complexes with more than 20 different proteins, information is sparse, and there are only one or two known complexes of each of those sizes. This can be explained by the difficulty of experimentally finding big complexes and represents a bias in the distribution. It can be assumed that the distribution is more accurate for the smaller complexes and thus the consistency between the known and simulated complexes for complexes with size below 20 is an indicator for a good parameter combination. The shown distribution (α = 0.005, β = 0.0125 = 2.5α) was the best fitting parameter set. Therefore those parameters were used for all following evaluations. If not stated otherwise, we consider np = 1000 copies for each protein. Fig. 7 View largeDownload slide Left: Histogram of complex sizes for human complexes in CORUM database. There are 14 outliers beyond 30: 31, 32, 33, 34, 36, 37, 44, 47, 48, 78, 80, 81, 104, 143. Complex sizes larger than 20 (yellow line) occurred only once or twice. Right: Comparison of the complementary cumulative distribution function (ccdf) of simulated complex sizes with constraints for α = 0.005, β = 0.0125 (averaged over 50 runs) against the distribution from the CORUM database. Cumulative distributions are capped at complex sizes where the absolute complex frequency drops below 10. Complex sizes above 20 (yellow line) occur only once or twice. Fig. 7 View largeDownload slide Left: Histogram of complex sizes for human complexes in CORUM database. There are 14 outliers beyond 30: 31, 32, 33, 34, 36, 37, 44, 47, 48, 78, 80, 81, 104, 143. Complex sizes larger than 20 (yellow line) occurred only once or twice. Right: Comparison of the complementary cumulative distribution function (ccdf) of simulated complex sizes with constraints for α = 0.005, β = 0.0125 (averaged over 50 runs) against the distribution from the CORUM database. Cumulative distributions are capped at complex sizes where the absolute complex frequency drops below 10. Complex sizes above 20 (yellow line) occur only once or twice. Next, we examined how many simulation steps were needed to reach convergence (steady state). For the chosen parameter set, between 250 and 350 steps were required (Fig. 8). The convergence criterion is based on the edge density (number of interactions) in simulated complexes, which remains stable after meeting the criterion, together with the number of unbound proteins (singletons). Fig. 8 View largeDownload slide Steady-state statistics of one simulation run (with constraints); the convergence criterion is satisfied after 269 steps (yellow line), and the simulation continues for the same number of steps. Other simulations ran for comparable numbers of steps. Top: Edge density (number of interactions divided by total number of protein copies in the simulation). Bottom: Singleton fraction (number of singleton proteins divided by total number of protein copies). Fig. 8 View largeDownload slide Steady-state statistics of one simulation run (with constraints); the convergence criterion is satisfied after 269 steps (yellow line), and the simulation continues for the same number of steps. Other simulations ran for comparable numbers of steps. Top: Edge density (number of interactions divided by total number of protein copies in the simulation). Bottom: Singleton fraction (number of singleton proteins divided by total number of protein copies). Running times and reproducibility of simulations In principle, the number of proteins, interactions, and constraints in a simulation is not limited, except by the available memory and, to a lesser degree, computation time. We simulated the described network with 718 protein types and np copies of each protein for different values of np on a single thread of an Intel Core i7-4790 K processor at 4.00 GHz with the time and memory requirements shown in Table 2. We see that even large copy numbers can be handled in a reasonable amount of time and with an amount of RAM that is typically available on today's common desktop PCs. Table 2 Time and memory requirements for the simulation of complex formation in the extended adhesome network with 718 protein types and different copy numbers of each protein. Numbers are given for a single run, averaged over 10 runs, on a single thread of an Intel Core i7-4790 K processor at 4.00 GHz Copies Time [min:s] Memory [GB] 1000 04:00 1.23 2000 08:30 2.33 3000 15:03 2.64 4000 21:02 3.80 5000 26:44 4.49 6000 32:05 5.16 7000 36:58 6.31 8000 44:27 7.49 9000 48:20 8.17 Copies Time [min:s] Memory [GB] 1000 04:00 1.23 2000 08:30 2.33 3000 15:03 2.64 4000 21:02 3.80 5000 26:44 4.49 6000 32:05 5.16 7000 36:58 6.31 8000 44:27 7.49 9000 48:20 8.17 View Large Table 2 Time and memory requirements for the simulation of complex formation in the extended adhesome network with 718 protein types and different copy numbers of each protein. Numbers are given for a single run, averaged over 10 runs, on a single thread of an Intel Core i7-4790 K processor at 4.00 GHz Copies Time [min:s] Memory [GB] 1000 04:00 1.23 2000 08:30 2.33 3000 15:03 2.64 4000 21:02 3.80 5000 26:44 4.49 6000 32:05 5.16 7000 36:58 6.31 8000 44:27 7.49 9000 48:20 8.17 Copies Time [min:s] Memory [GB] 1000 04:00 1.23 2000 08:30 2.33 3000 15:03 2.64 4000 21:02 3.80 5000 26:44 4.49 6000 32:05 5.16 7000 36:58 6.31 8000 44:27 7.49 9000 48:20 8.17 View Large We assessed whether the simulations generated reproducible results, both with and without constraints. For this, we compared the complex abundances for different runs against each other. We abstracted from network topology and multiplicity of proteins within complexes, and only considered the sets of contained proteins. Fig. 9 shows the abundances of different complexes in different simulation runs, grouped by complex size. Apart from singletons, most complexes did not occur often. Yet, complexes that occurred in multiple runs tended to occur with similar frequency. Both with and without constraints, our simulation produced reproducible complexes. Fig. 9 View largeDownload slide Density plot comparing the abundances of complexes between simulation repeats. Abundances are accumulated over the 4950 = (100·99)/2 pair-wise combinations of 100 simulation runs, with constraints (top row) or without constraints (bottom tow). For this comparison, complexes were considered equal if their protein sets are equal (disregarding protein multiplicities and interactions). Note the different scales, as large complexes are formed less frequently than small complexes or singletons. Fig. 9 View largeDownload slide Density plot comparing the abundances of complexes between simulation repeats. Abundances are accumulated over the 4950 = (100·99)/2 pair-wise combinations of 100 simulation runs, with constraints (top row) or without constraints (bottom tow). For this comparison, complexes were considered equal if their protein sets are equal (disregarding protein multiplicities and interactions). Note the different scales, as large complexes are formed less frequently than small complexes or singletons. Complex sizes with and without constraints To evaluate which effect the interaction dependencies have on the simulation, we compared the simulation results with and without constraints. For each set of constraints we did 100 simulation runs with the chosen parameters. The complex size distributions at steady state are compared in Fig. 10. As could be expected, simulations without constraints lead to significantly larger complexes than simulations with constraints. Fig. 10 View largeDownload slide Left: Complementary cumulative distribution functions (ccdf, log scale) of protein complex sizes at steady state for 100 simulation runs with constraints (blue) and without constraints (green). The bold line depicts the mean; the shaded area depicts minimum and maximum. Right: Cumulative distribution function (cdf, linear scale) of the same runs for complex sizes ≤35. Fig. 10 View largeDownload slide Left: Complementary cumulative distribution functions (ccdf, log scale) of protein complex sizes at steady state for 100 simulation runs with constraints (blue) and without constraints (green). The bold line depicts the mean; the shaded area depicts minimum and maximum. Right: Cumulative distribution function (cdf, linear scale) of the same runs for complex sizes ≤35. When using constraints, the maximum complex size is 75 (averaged over runs). All simulations without constraints develop one large complex accumulating nearly all different protein types in the network. This complex has an average size of 84 182 and no biological relevance. Further, the simulations without constraints have fewer singleton proteins at steady state than the simulations with constraints. Characterization of perturbation effects In order to illustrate the capability of our framework to estimate effects of perturbations, we chose three proteins with different roles in the network, i.e. different numbers of interactors and constraints (Table 3): CRK, YWHAG and ABAT. Both CRK and YWHAG have a large number of interactors and interactions, but they differ in the number of domains and additionally in their role in the network (Fig. 5): CRK's interactors include several proteins which have themselves many interactors, while YWHAG 's interactors frequently have no other interactors than YWHAG itself. In contrast, ABAT is at the periphery of the network with a single interactor. Table 3 Proteins chosen for perturbation simulations (5-fold overexpression and complete knockout). For each protein, we list its number of interactors (proteins with at least one interaction with the given protein), number of model domains (including artificial unique domains, see “construction of constrained protein interaction networks” above), number of interactions (at least as high as the number of interactors), and the number of constraints in which the protein participates Protein Interactors Interactions Domains Constraints CRK 124 135 22 122 YWHAG 119 119 6 114 ABAT 1 1 1 1 Protein Interactors Interactions Domains Constraints CRK 124 135 22 122 YWHAG 119 119 6 114 ABAT 1 1 1 1 View Large Table 3 Proteins chosen for perturbation simulations (5-fold overexpression and complete knockout). For each protein, we list its number of interactors (proteins with at least one interaction with the given protein), number of model domains (including artificial unique domains, see “construction of constrained protein interaction networks” above), number of interactions (at least as high as the number of interactors), and the number of constraints in which the protein participates Protein Interactors Interactions Domains Constraints CRK 124 135 22 122 YWHAG 119 119 6 114 ABAT 1 1 1 1 Protein Interactors Interactions Domains Constraints CRK 124 135 22 122 YWHAG 119 119 6 114 ABAT 1 1 1 1 View Large We simulated 50 runs for each of the selected proteins with five-fold overexpression and with complete knockout of the protein, both with and without constraints. Changes in complex sizes We investigated how perturbations change the complex size distributions (Fig. 11). As may be expected, perturbations of ABAT did not detectably influence the complex size distribution, neither with nor without constraints. A plausible explanation is that ABAT only has a single interactor, so its influence on the network is limited. Fig. 11 View largeDownload slide Impact of perturbations on the complex size distribution for CRK (left), ABAT (middle), YWHAG (right) under constraints (blue) and without constraints (green). Top row shows the mean complementary cumulative distribution functions (ccdf, 100 runs each with and without constraints). Solid lines depict unperturbed state, dashed lines overexpression and dotted lines knockout. The bottom row shows the numbers for singletons (unbound proteins). Fig. 11 View largeDownload slide Impact of perturbations on the complex size distribution for CRK (left), ABAT (middle), YWHAG (right) under constraints (blue) and without constraints (green). Top row shows the mean complementary cumulative distribution functions (ccdf, 100 runs each with and without constraints). Solid lines depict unperturbed state, dashed lines overexpression and dotted lines knockout. The bottom row shows the numbers for singletons (unbound proteins). Similar to ABAT, perturbation of CRK had no visible global effect on larger complexes. However, singleton fractions differed with the type of perturbation (overexpression vs. knockout) and whether constraints were included in the model or not. In addition to being a central hub in the network, CRK is also a limiting factor with a small singleton fraction (i.e., most copies were bound) in the unperturbed simulations (with constraints: 0.011, without: 0.0), so a noticeable effect was to be expected. If CRK is knocked out, it can no longer act as a hub, and we observe more singleton complexes; this is true both with and without constraints. After overexpression of CRK, we observe fewer singleton complexes. Although YWHAG and CRK are comparable concerning their number of interactors and constraints, perturbing YWHAG has different effects than perturbing CRK. With constraints, overexpression of YWHAG has little effect on larger complexes. The reason is that YWHAG does not occur in a dense region of the network, and most interaction partners can only interact with YWHAG itself while inhibiting each other, such that complex size is limited (Fig. 5). On the other hand, a knockout leads to a slight increase in complex sizes. An explanation is that the few interaction partners that connected YWHAG to the rest of the network are now free to enlarge other complexes. Importantly, these effects are only visible when considering interaction dependencies. Without them, knockout leads to a major drop in complex sizes. The reason is that the family of larger complexes around YWHAG that are only possible when ignoring interaction dependencies disappears. With overexpression, there are more smaller complexes and fewer complexes of size ≥10. The reason is that a higher presence of YWHAG increases the probability that one of its interaction partners chooses a free YWHAG instead of increasing the size of an already existing larger complex around YWHAG. Changes in singleton fractions We also examined the influence of perturbations on the singleton fraction of each protein. Fig. 12 left shows the relation between the number of interactors and the singleton fraction of a protein in the constrained network. More interactors mean more interaction possibilities, which lead to more bound copies and thus fewer singleton copies of a protein. The right side of Fig. 12 shows the distribution of singleton fractions for the constrained network versus the unconstrained network. Without constraints, more interactions are possible, and therefore singleton fractions are lower overall than with constraints. Fig. 12 View largeDownload slide Left: Relation between number of interactors and singleton fraction for each protein. The values are averaged over 100 runs with constraints. Right: Influence of constraints on the singleton fraction distribution (averages over 100 runs with constraints and 100 runs without constraints). Fig. 12 View largeDownload slide Left: Relation between number of interactors and singleton fraction for each protein. The values are averaged over 100 runs with constraints. Right: Influence of constraints on the singleton fraction distribution (averages over 100 runs with constraints and 100 runs without constraints). We examined the average singleton fraction of each protein in unperturbed simulations as well as with overexpression and knockout of specific proteins (CRK, YWHAG, ABAT) in comparison to the unperturbed experiment. In Fig. 13, the difference in singleton fraction is shown for each protein for overexpression (y-axis) and knockout (x-axis) of CRK, YWHAG and ABAT separately. We may expect that direct interactors of the perturbed protein (purple dots) are in the bottom right quadrant (more singletons after knockout and fewer singletons after overexpression of direct interactor) and that most of the other dots appear near the center with only small changes. Fig. 13 View largeDownload slide Differences in singleton fractions for perturbations (x-axis: knockout vs. unperturbed; y-axis: overexpression vs. unperturbed) of CRK (top row), ABAT (middle row) and YWHAG (bottom row) in simulations with constraints (left column) and without constraints (right column), averaged over 50 runs. Each dot represents one protein; colors show the distance (shortest path) to the perturbed protein in the interaction network. Since the perturbed protein (red dot) is not always visible, its values are given in the legend. Direct interactors (purple) are mainly expected in the lower right quadrant (more singletons after knockout, fewer singletons after overexpression). Note the different scales for the different proteins. Fig. 13 View largeDownload slide Differences in singleton fractions for perturbations (x-axis: knockout vs. unperturbed; y-axis: overexpression vs. unperturbed) of CRK (top row), ABAT (middle row) and YWHAG (bottom row) in simulations with constraints (left column) and without constraints (right column), averaged over 50 runs. Each dot represents one protein; colors show the distance (shortest path) to the perturbed protein in the interaction network. Since the perturbed protein (red dot) is not always visible, its values are given in the legend. Direct interactors (purple) are mainly expected in the lower right quadrant (more singletons after knockout, fewer singletons after overexpression). Note the different scales for the different proteins. As expected, perturbations of ABAT have no strong effect for all proteins regardless of the distance to ABAT. For perturbations of CRK, we observe the expected result in the network without constraints. However, in the constrained network, several direct interactors can be seen in the bottom left quadrant, meaning that those proteins have fewer singletons after knockout of CRK. A possible explanation is that some direct interactors of CRK may also interact with each other. In the presence of CRK, its direct interactors compete with each other. Once CRK is gone, they are free to form complexes with other proteins, especially other direct interactors of CRK. Importantly, this effect is not seen without considering interaction dependencies: without constraints, purple dots are exclusively observed in the lower right quadrant and near the center. Perturbations of YWHAG have a similar, but overall stronger effect than perturbations of CRK. In the unconstrained network, the majority of direct interactors can be found in the bottom right quadrant as expected, but in the constrained network, the majority shifts to the bottom left quadrant. In summary, as CRK and YWHAG illustrate, consideration of constraints yields qualitatively and quantitatively different effects than considering the plain interaction network. Additionally, perturbations of numerically comparable proteins may lead to different results under constraints because of the local network topology: considering interaction dependencies only of the perturbed protein and its immediate interactors may still be insufficient for foreseeing the outcome of the perturbation. Therefore, a simulation of the complete system, as our approach performs, is essential to ensure all the interactions, interaction dependencies and topological features are taken into account. Discussion We have proposed a simple but powerful framework for formalizing dependencies between protein interactions. As far as we are aware, this is the first framework able to incorporate complex higher-order dependencies beyond direct competitions together with multiple copies per protein. We have shown that interaction dependencies or constraints have a direct effect on complex sizes, and additionally that they interact with local network topology. In fact, our simulations suggest that perturbations may have complex and hard-to-predict effects when taking constraints into account. The simulations are efficient in the sense that networks with a total of over a million protein copies can be simulated to steady state within under ten minutes. Compared to straightforward simulations, we achieved high speed-ups by using the bit vector techniques described in “An efficient algorithm for checking constraints”. The size of the network that can be simulated is thus primarily limited by the available random access memory (see Table 2). In its current form, our model makes several simplifying assumptions. For example, we check constraints only during the association phase but not during the dissociation phase. This decision is never a problem with competitions (most of the constraints in the model), and as we argue in “Simulation of protein complex formation” using the Vinculin/Talin example, we think that it captures important real effects. However, for other examples, reality might be different. By having two sets of constraints, one that has to be maintained during dissociation and one that does not, the model can be easily adapted. Currently, our model ignores the spatial distribution of the proteins, and we have worked with uniform protein copy numbers (or concentrations) as well as uniform association and dissociation probabilities (corresponding to kinetic coefficients). Clearly, this is not realistic if the goal is to completely simulate the real biological processes happening within a cell. However, such a simulation would require much more knowledge about localization, concentrations and kinetics than is available today (late 2017). When this information becomes available, it is straightforward to scale our model accordingly. For example, association and dissociation probabilities can be chosen per interaction without causing a performance penalty, and protein concentrations can already be arbitrarily parameterized in the current implementation. The spatial location of each protein copy can be considered by adding diffusion and movement rules. Since our knowledge of constraints is currently incomplete, the biological relevance of the simulated complexes is limited. However, note that this is a problem with the available data, not with the model or simulation framework itself. Only few databases so far systematically include minable constraints, and most of them are competitions based on overlapping binding domains, as annotated in the DOMINO database,52 or data records from the IntAct database,55 which one may query with the search term pbiorole:competitor to obtain information on interactions where one interaction partner is a competitor. In the coming years, emerging technologies however suggest a rapid increase in the availability of the needed information, e.g., via the large scale generation of libraries of cell lines having two or more endogenously tagged fluorescent proteins,56 and recent high-throughput and multiplexed implementations of fluorescence correlation spectroscopy which allow us to systematically measure endogenous concentrations, binding constants and high-order complexes in such libraries of cell lines.57–62 Inhibiting interactions with rationally designed compounds will provide additional insights.63 For our examples, we chose a network with a high density of known constraints. Unfortunately, the set of proteins in our network has only a small overlap with protein sets in databases of known complexes, such as CORUM,54 so that we cannot directly compare predicted and real complexes. We would currently expect a number of false positive predictions, but we may also expect that the biological relevance of the simulation results will increase jointly with more complete knowledge of constraints. We believe that our results offer important insights already today, as we demonstrated by the difference in shift of singleton fractions of direct and indirect interactors after perturbation, when comparing simulations with constraints and without constraints, but also when comparing perturbations of proteins with different network roles (with constraints); cf.Fig. 13. In the future, we will consider more realistic concentrations (or proxies for more realistic concentrations, such as setting the simulated protein copy number proportional to its number of interactors), more complete dependency data, spatial resolution, and more detailed kinetics. Further, we plan to extend our model to incorporate post-translational modifications such as phosphorylation, since these can also play a role in interaction dependencies. When modeling these as interactions with a special type of node, they can likewise be used within constraints and considered for perturbations. Overall, we believe that constrained networks are a useful and versatile tool for interactomics studies that will improve and scale with increasing knowledge and data about real interaction dependencies. Conflicts of interest There are no conflicts of interest to declare. Acknowledgements S. R. acknowledges funding from the Mercator Research Center Ruhr (MERCUR), project Pe-2013-0012 (UA Ruhr professorship) and from the German Research Foundation (DFG), Collaborative Research Center SFB 876, project C1. J. K. was supported by the Dutch NWO Veni grant 016.Veni.173.076. Notes and references 1 T. C. Walther and M. Mann , J. Cell Biol. , 2010 , 190 , 491 – 500 . Crossref Search ADS PubMed 2 J. R. Parrish , K. D. Gulyas and R. L. Finley , Curr. Opin. Biotechnol. , 2006 , 17 , 387 – 393 . Crossref Search ADS PubMed 3 J. Mehla , J. H. Caufield , N. Sakhawalkar and P. Uetz , Methods Enzymol. , 2017 , 586 , 333 – 358 . Crossref Search ADS PubMed 4 T. Rolland , M. Tasan , B. Charloteaux , S. J. Pevzner , Q. Zhong , N. Sahni , S. Yi , I. Lemmens , C. Fontanillo , R. Mosca , A. Kamburov , S. D. Ghiassian , X. Yang , L. Ghamsari , D. Balcha , B. E. Begg , P. Braun , M. Brehme , M. P. Broly , A.-R. Carvunis , D. Convery-Zupan , R. Corominas , J. Coulombe-Huntington , E. Dann , M. Dreze , A. Dricot , C. Fan , E. Franzosa , F. Gebreab , B. J. Gutierrez , M. F. Hardy , M. Jin , S. Kang , R. Kiros , G. N. Lin , K. Luck , A. MacWilliams , J. Menche , R. R. Murray , A. Palagi , M. M. Poulin , X. Rambout , J. Rasla , P. Reichert , V. Romero , E. Ruyssinck , J. M. Sahalie , A. Scholz , A. A. Shah , A. Sharma , Y. Shen , K. Spirohn , S. Tam , A. O. Tejeda , S. A. Trigg , J.-C. Twizere , K. Vega , J. Walsh , M. E. Cusick , Y. Xia , A.-L. Barabási , L. M. Iakoucheva , P. Aloy , J. De Las Rivas , J. Tavernier , M. A. Calderwood , D. E. Hill , T. Hao , F. P. Roth and M. Vidal , Cell , 2014 , 159 , 1212 – 1226 . Crossref Search ADS PubMed 5 H. Yu , P. Braun , M. A. Yildirim , I. Lemmens , K. Venkatesan , J. Sahalie , T. Hirozane-Kishikawa , F. Gebreab , N. Li , N. Simonis , T. Hao , J.-F. Rual , A. Dricot , A. Vazquez , R. R. Murray , C. Simon , L. Tardivo , S. Tam , N. Svrzikapa , C. Fan , A.-S. de Smet , A. Motyl , M. E. Hudson , J. Park , X. Xin , M. E. Cusick , T. Moore , C. Boone , M. Snyder , F. P. Roth , A.-L. Barabási , J. Tavernier , D. E. Hill and M. Vidal , Science (New York, N.Y.) , 2008 , 322 , 104 – 110 . Crossref Search ADS PubMed 6 Y. K. Lei , Z. H. You , Z. Ji , L. Zhu and D. S. Huang , BMC Bioinf. , 2012 , 13 Suppl 7 S3 . Crossref Search ADS 7 V. S. Rao , K. Srinivas , G. N. Sujini and G. N. Kumar , Int. J. Proteomics , 2014 , 2014 , 147648 . Crossref Search ADS PubMed 8 D. S. Huang , L. Zhang , K. Han , S. Deng , K. Yang and H. Zhang , Curr. Protein Pept. Sci. , 2014 , 15 , 553 – 560 . Crossref Search ADS PubMed 9 S. Koyabu , T. T. Phan and T. Ohkawa , BioMed Res. Int. , 2015 , 2015 , 928531 . Crossref Search ADS PubMed 10 N. Pržulj , BioEssays , 2011 , 33 , 115 – 123 . Crossref Search ADS PubMed 11 E. A. Coker , C. Mitsopoulos , P. Workman and B. Al-Lazikani , PLoS One , 2017 , 12 , e0177701 . Crossref Search ADS PubMed 12 R. A. Laskowski , F. Gerick and J. M. Thornton , FEBS Lett. , 2009 , 583 , 1692 – 1698 . Crossref Search ADS PubMed 13 J. R. Beach , K. S. Bruun , L. Shao , D. Li , Z. Swider , K. Remmert , Y. Zhang , M. A. Conti , R. S. Adelstein , N. M. Rusan , E. Betzig and J. A. Hammer , Nat. Cell Biol. , 2017 , 19 , 85 – 93 . Crossref Search ADS PubMed 14 J.-E. Hoffmann , Y. Fermin , R. L. Stricker , K. Ickstadt and E. Zamir , eLife , 2014 , 3 , e02257 . Crossref Search ADS PubMed 15 J. Köster , E. Zamir and S. Rahmann , Integr. Biol. , 2012 , 4 , 805 . Crossref Search ADS 16 C. Suarez and D. R. Kovar , Nat. Rev. Mol. Cell Biol. , 2016 , 17 , 799 – 810 . Crossref Search ADS PubMed 17 P. Crépieux , A. Poupon , N. Langonné-Gallay , E. Reiter , J. Delgado , M. H. Schaefer , T. Bourquard , L. Serrano and C. Kiel , Front. Endocrinol. , 2017 , 8 , 32 . Crossref Search ADS 18 C. Kiel , E. Verschueren , J.-S. Yang and L. Serrano , Sci. Signaling , 2013 , 6 , ra109 . Crossref Search ADS 19 Z. Itzhaki , PLoS One , 2011 , 6 , e21724 . Crossref Search ADS PubMed 20 S. Srihari , C. H. Yong , A. Patil and L. Wong , FEBS Lett. , 2015 , 589 , 2590 – 2602 . Crossref Search ADS PubMed 21 K. Drew , C. Lee , R. L. Huizar , F. Tu , B. Borgeson , C. D. McWhite , Y. Ma , J. B. Wallingford and E. M. Marcotte , Mol. Syst. Biol. , 2017 , 13 , 932 . Crossref Search ADS PubMed 22 C. Hernandez , C. Mella , G. Navarro , A. Olivera-Nappa and J. Araya , PLoS One , 2017 , 12 , e0183460 . Crossref Search ADS PubMed 23 X. Ma and L. Gao , BMC Syst. Biol. , 2012 , 6 Suppl 1 S6 . Crossref Search ADS PubMed 24 M. Pellegrini , M. Baglioni and F. Geraci , BMC Bioinf. , 2016 , 17 , 372 . Crossref Search ADS 25 S. H. Jung , B. Hyun , W.-H. Jang , H.-Y. Hur and D.-S. Han , Bioinformatics , 2010 , 26 , 385 – 391 . Crossref Search ADS PubMed 26 Y. Ozawa , R. Saito , S. Fujimori , H. Kashima , M. Ishizaka , H. Yanagawa , E. Miyamoto-Sato and M. Tomita , BMC Bioinf. , 2010 , 11 , 350 . Crossref Search ADS 27 W. Ma , C. McAnulla and L. Wang , Biochim. Biophys. Acta, Proteins Proteomics , 2012 , 1824 , 1418 – 1424 . Crossref Search ADS 28 T. Will and V. Helms , Bioinformatics , 2014 , 30 , i415 – i421 . Crossref Search ADS PubMed 29 J. J. Hughey , T. K. Lee and M. W. Covert , Wiley Interdiscip. Rev.: Syst. Biol. Med. , 2010 , 2 , 194 – 209 . Crossref Search ADS PubMed 30 B. N. Kholodenko , Nat. Rev. Mol. Cell Biol. , 2006 , 7 , 165 – 176 . Crossref Search ADS PubMed 31 N. Le Novère , Nat. Rev. Genet. , 2015 , 16 , 146 – 158 . Crossref Search ADS PubMed 32 W. Im , J. Liang , A. Olson , H.-X. Zhou , S. Vajda and I. A. Vakser , J. Mol. Biol. , 2016 , 428 , 2943 – 2964 . Crossref Search ADS PubMed 33 B. Schoeberl , C. Eichler-Jonsson , E. D. Gilles and G. Müller , Nat. Biotechnol. , 2002 , 20 , 370 – 375 . Crossref Search ADS PubMed 34 W. Ma , A. Trusina , H. El-Samad , W. A. Lim and C. Tang , Cell , 2009 , 138 , 760 – 773 . Crossref Search ADS PubMed 35 M. K. Morris , J. Saez-Rodriguez , P. K. Sorger and D. A. Lauffenburger , Biochemistry , 2010 , 49 , 3216 – 3224 . Crossref Search ADS PubMed 36 C. Kiel , P. Beltrao and L. Serrano , Annu. Rev. Biochem. , 2008 , 77 , 415 – 441 . Crossref Search ADS PubMed 37 C. Kiel and L. Serrano , Curr. Opin. Biotechnol. , 2012 , 23 , 305 – 314 . Crossref Search ADS PubMed 38 C. Sánchez Claros and A. Tramontano , PLoS One , 2012 , 7 , e38765 . Crossref Search ADS PubMed 39 H. Park , H. Lee and C. Seok , Curr. Opin. Struct. Biol. , 2015 , 35 , 24 – 31 . Crossref Search ADS PubMed 40 I. A. Vakser , Biophys. J. , 2014 , 107 , 1785 – 1793 . Crossref Search ADS PubMed 41 R. Mosca , C. Pons , J. Fernández-Recio and P. Aloy , PLoS Comput. Biol. , 2009 , 5 , e1000490 . Crossref Search ADS PubMed 42 D. Kozakov , D. R. Hall , B. Xia , K. A. Porter , D. Padhorny , C. Yueh , D. Beglov and S. Vajda , Nat. Protoc. , 2017 , 12 , 255 – 278 . Crossref Search ADS PubMed 43 M. N. Wass , G. Fuentes , C. Pons , F. Pazos and A. Valencia , Mol. Syst. Biol. , 2011 , 7 , 469 . Crossref Search ADS PubMed 44 R. Zaidel-Bar , S. Itzkovitz , A. Ma'ayan , R. Iyengar and B. Geiger , Nat. Cell Biol. , 2007 , 9 , 858 – 867 . Crossref Search ADS PubMed 45 R. Zaidel-Bar and B. Geiger , J. Cell Sci. , 2010 , 123 , 1385 – 1388 . Crossref Search ADS PubMed 46 E. R. Horton , A. Byron , J. A. Askari , D. H. J. Ng , A. Millon-Frémillon , J. Robertson , E. J. Koper , N. R. Paul , S. Warwood , D. Knight , J. D. Humphries and M. J. Humphries , Nat. Cell Biol. , 2015 , 17 , 1577 – 1587 . Crossref Search ADS PubMed 47 E. R. Horton , J. D. Humphries , J. James , M. C. Jones , J. A. Askari and M. J. Humphries , J. Cell Sci. , 2016 , 129 , 4159 – 4163 . Crossref Search ADS PubMed 48 E. Zamir and B. Geiger , J. Cell Sci. , 2001 , 114 , 3583 – 3590 . PubMed 49 E. Mendelson , , Introduction to Mathematical Logic , Taylor & Francis , 1997 . 50 M. Yao , B. T. Goult , H. Chen , P. Cong , M. P. Sheetz and J. Yan , Sci. Rep. , 2014 , 4 , 4610 . Crossref Search ADS PubMed 51 C. Stark , B.-J. Breitkreutz , T. Reguly , L. Boucher , A. Breitkreutz and M. Tyers , Nucleic Acids Res. , 2006 , 34 , D535 – D539 . Crossref Search ADS PubMed 52 A. Ceol , A. Chatr-aryamontri , E. Santonico , R. Sacco , L. Castagnoli and G. Cesareni , Nucleic Acids Res. , 2007 , 35 , D557 – D560 . Crossref Search ADS PubMed 53 J. Das and H. Yu , BMC Syst. Biol. , 2012 , 6 , 92 . Crossref Search ADS PubMed 54 A. Ruepp , B. Waegele , M. Lechner , B. Brauner , I. Dunger-Kaltenbach , G. Fobo , G. Frishman , C. Montrone and H.-W. Mewes , Nucleic Acids Res. , 2010 , 38 , D497 – D501 . Crossref Search ADS PubMed 55 S. Orchard , M. Ammari , B. Aranda , L. Breuza , L. Briganti , F. Broackes-Carter , N. H. Campbell , G. Chavali , C. Chen , N. del Toro , M. Duesbury , M. Dumousseau , E. Galeota , U. Hinz , M. Iannuccelli , S. Jagannathan , R. Jimenez , J. Khadake , A. Lagreid , L. Licata , R. C. Lovering , B. Meldal , A. N. Melidoni , M. Milagros , D. Peluso , L. Perfetto , P. Porras , A. Raghunath , S. Ricard-Blum , B. Roechert , A. Stutz , M. Tognolli , K. van Roey , G. Cesareni and H. Hermjakob , Nucleic Acids Res. , 2014 , 42 , D358 – D363 . Crossref Search ADS PubMed 56 M. Boutros , F. Heigwer and C. Laufer , Cell , 2015 , 163 , 1314 – 1325 . Crossref Search ADS PubMed 57 L. C. Hwang , M. Gösch , T. Lasser and T. Wohland , Biophys. J. , 2006 , 91 , 715 – 727 . Crossref Search ADS PubMed 58 H. M. Wobma , M. L. Blades , E. Grekova , D. L. McGuire , K. Chen , W. C. W. Chan and D. T. Cramb , Phys. Chem. Chem. Phys. , 2012 , 14 , 3290 – 3294 . Crossref Search ADS PubMed 59 M. L. Blades , E. Grekova , H. M. Wobma , K. Chen , W. C. W. Chan and D. T. Cramb , Anal. Chem. , 2012 , 84 , 9623 – 9631 . Crossref Search ADS PubMed 60 K. G. Heinze , M. Jahnz and P. Schwille , Biophys. J. , 2004 , 86 , 506 – 516 . Crossref Search ADS PubMed 61 H. E. Grecco , S. Imtiaz and E. Zamir , Cytometry, Part A , 2016 , 89 , 761 – 775 . Crossref Search ADS 62 M. Wachsmuth , C. Conrad , J. Bulkescher , B. Koch , R. Mahen , M. Isokane , R. Pepperkok and J. Ellenberg , Nat. Biotechnol. , 2015 , 33 , 384 – 389 . Crossref Search ADS PubMed 63 D. Rognan , Med. Chem. Commun. , 2015 , 6 , 51 – 60 . Crossref Search ADS The journal is © The Royal Society of Chemistry 2018
TI - Modeling and simulating networks of interdependent protein interactions
JF - Integrative Biology
DO - 10.1039/c8ib00012c
DA - 2018-05-01
UR - https://www.deepdyve.com/lp/oxford-university-press/modeling-and-simulating-networks-of-interdependent-protein-xYIiD4jRun
SP - 290
VL - 10
IS - 5
DP - DeepDyve
ER -