LASSI: A lattice model for simulating phase transitions of multivalent proteins
LASSI: A lattice model for simulating phase transitions of multivalent proteins
Choi, Jeong-Mo;Dar, Furqan;Pappu, Rohit V.
2019-10-21 00:00:00
a1111111111 Many biomolecular condensates form via spontaneous phase transitions that are driven by multivalent proteins. These molecules are biological instantiations of associative polymers that conform to a so-called stickers-and-spacers architecture. The stickers are protein-pro- OPENACCESS tein or protein-RNA interaction motifs and / or domains that can form reversible, non-cova- lent crosslinks with one another. Spacers are interspersed between stickers and their Citation: Choi J-M, Dar F, Pappu RV (2019) LASSI: A lattice model for simulating phase transitions of preferential interactions with solvent molecules determine the cooperativity of phase transi- multivalent proteins. PLoS Comput Biol 15(10): tions. Here, we report the development of an open source computational engine known as e1007028. https://doi.org/10.1371/journal. LASSI (LAttice simulation engine for Sticker and Spacer Interactions) that enables the cal- pcbi.1007028 culation of full phase diagrams for multicomponent systems comprising of coarse-grained Editor: Ozlem Keskin, Koc ¸ University, TURKEY representations of multivalent proteins. LASSI is designed to enable computationally effi- Received: April 12, 2019 cient phenomenological modeling of spontaneous phase transitions of multicomponent Accepted: October 1, 2019 mixtures comprising of multivalent proteins and RNA molecules. We demonstrate the appli- cation of LASSI using simulations of linear and branched multivalent proteins. We show that Published: October 21, 2019 dense phases are best described as droplet-spanning networks that are characterized by Copyright:© 2019 Choi et al. This is an open reversible physical crosslinks among multivalent proteins. We connect recent observations access article distributed under the terms of the Creative Commons Attribution License, which regarding correlations between apparent stoichiometry and dwell times of condensates to permits unrestricted use, distribution, and being proxies for the internal structural organization, specifically the convolution of internal reproduction in any medium, provided the original density and extent of networking, within condensates. Finally, we demonstrate that the con- author and source are credited. cept of saturation concentration thresholds does not apply to multicomponent systems Data Availability Statement: All relevant data are where obligate heterotypic interactions drive phase transitions. This emerges from the ellip- within the manuscript and its Supporting soidal structures of phase diagrams for multicomponent systems and it has direct implica- Information files. tions for the regulation of biomolecular condensates in vivo. Funding: This work was supported by grants from the US National Science Foundation (MCB- 1614766), http://nsf.gov, the Human Frontier Science Program (RGP0034/2017),http://www. hfsp.org, the US National Institutes of Health Author summary (5R01NS056114), http://nih.gov, and the St. Jude Children’s Research Hospital through the research Spatial and temporal organization of molecular matter is a defining hallmark of cellular collaborative on membraneless organelles, http:// ultrastructure and recent attention has focused on membraneless organelles, which are www.stjude.org, to RVP. The funders had no role also referred to as biomolecular condensates. Of interest are condensates that form via in study design, data collection and analysis, PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 1 / 39 Simulations of phase transitions of multivalent proteins decision to publish, or preparation of the manuscript. phase transitions that combine phase separation and networking of multivalent protein and nucleic acid molecules. Building on recently recognized analogies between associative Competing interests: Rohit Pappu is a member of polymers and multivalent proteins, we have developed and deployed LASSI, an open the Scientific Advisory Board of Dewpoint Therapeutics Inc. There are no competing interests source computational engine that enables the calculation of architecture-specific phase from this affiliation. diagrams for multivalent proteins. LASSI relies on a priori identification of stickers and spacers within a multivalent protein and mapping the stickers onto a 3-dimensional lat- tice. A Monte Carlo engine that incorporates a suite of novel and established move sets enables simulations that track density inhomogeneities and changes to the extent of net- working among stickers as a function of protein concentration and interaction strengths. Calculation of distribution functions and other order parameters allow us to compute full phase diagrams for multivalent proteins modeled using a stickers-and-spacers representa- tion on simple cubic lattices. These calculations allow us to rationalize experimental observations and open the door to the design of protein architectures with bespoke phase behavior. LASSI can be deployed to study the phase behavior of multicomponent systems, which allows us to make direct contact with the physical principles underlying cellular biomolecular condensates. Introduction Biomolecular condensates organize cellular matter into non-stoichiometric assemblies of pro- teins and nucleic acids [1]. Prominent condensates include nuclear bodies [2] such as nucleoli, nuclear speckles [3, 4], and germline granules [1, 5, 6]. Condensates also form in the cyto- plasm. These include stress granules [7], membrane-anchored signaling clusters [8, 9], and bodies in post-synaptic zones [10]. All of these condensates share key features: (i) they range in size from a few hundred nanometers to tens of microns [1, 2, 11]; (ii) they are multicomponent entities comprising of hundreds of distinct types of proteins and nucleic acids; (iii) and of the hundreds of different types of molecules that make up condensates, a small number are essen- tial for the formation of condensates [1, 12]. The simplest feature that distinguishes proteins that are drivers of biomolecular condensates is the valence of interaction domains / motifs that can participate in non-covalent crosslinks [1, 12–14]. Biomolecular condensates can form and dissolve in an all-or-none manner [2, 11, 15]. The reversible formation and dissolution of condensates can be controlled by the concentrations of multivalent proteins that drive the formation of condensates; in simple two-components sys- tems comprising of macromolecules and solvent, condensates form when macromolecular concentrations cross macromolecule-specific threshold values known as saturation concentra- tions [15]. The transitions that characterize condensate formation bear the hallmarks of a sharp transition in macromolecular density, leading to the formation of a dense phase that is in equilibrium with a dilute phase. This type of transition, known as phase separation, sets up two or more coexisting phases to equalize the dense and dilute phase chemical potentials of the macromolecules across phase boundaries [15]. Phase separation is reversible and this reversibility can be achieved by (i) changes to concentrations of the driver macromolecules [9, 16], (ii) changes to solution conditions that alter the effective interaction strengths among driver molecules [17–20], (iii) altering saturation concentrations through ligand binding–a phenomenon known as polyphasic linkage [21, 22], or (iv) via biological regulation such as post-translational modifications of proteins [8, 12, 23]. Recent studies have focused on uncovering the defining features of proteins [13, 15, 17–19, 24–40] and RNA molecules [41–43] that drive phase transitions. Protein and RNA molecules PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 2 / 39 Simulations of phase transitions of multivalent proteins that drive phase transitions are biological instantiations of associative polymers [44] character- ized by a stickers-and-spacers architecture [45]. Stickers contribute to a hierarchy of specific pairwise and higher-order interactions that are either isotropic or anisotropic whereas spacers control the concentration-dependent inhomogeneities in the densities of stickers around one another. Stickers can be hot spots or sectors [46] on the surfaces of folded proteins [15, 29] or short linear motifs within intrinsically disordered regions (IDRs) [15, 24, 47]. Spacers are typi- cally IDRs that contribute through their sequence-specific effective solvation volumes to the interplay between density transitions (phase separation) and networking transitions that are better known as percolation [28, 29]. Spacers can also be folded domains that are akin to uni- formly reactive colloidal particles, although this has not yet been explored. Proteins can be mapped onto the stickers-and-spacers architecture as linear multivalent proteins, branched multivalent proteins, or some combination of the two [13, 15]. Simple two-component systems comprise of the solvent (which includes all components of the aqueous milieu) and a multivalent protein / RNA molecule. For fixed solution conditions, one can generate phase diagrams [25] as a function of protein concentration, the valence of stickers, the affinities of stickers, the sequence-specific effective solvation volumes of spacers, and the lengths / stiffness of spacers. The phase diagram can be investigated by keeping the valence of stickers, the lengths of spacers, and effective solvation volumes of spacers fixed while varying the concentration of stickers and the affinities between stickers [29]. Changes to protein concentration will enable density fluctuations and above the saturation concentration, designated as c , the density inhomogeneities lead to separation of the system into coexisting sat phases. The concentration of multivalent proteins in the dilute and dense phases will be denoted as c and c , respectively. For a given bulk concentration c that lies between sat dense bulk c and c , the fraction of molecules within each of the coexisting phases is governed by the sat dense lever rule [48]. Stickers also form reversible physical crosslinks and these crosslinks generate networks of inter-connected proteins. The number of proteins within the largest network of the system grows continuously as the protein concentration increases. Above a concentration threshold known as the percolation threshold and designated as c , the single largest network spans perc the entire system and this phenomenon is called percolation [49–51]. If the percolated net- works have the rheological properties of viscoelastic fluids, the fluids are referred to as network fluids [15, 52]. Phase separation and percolation can be coupled to one another. The coupling will depend on the values of c , c , and c relative to c . If c is smaller than all of c , c , and sat dense perc bulk bulk sat dense c , the system is in a single dilute phase with no large molecular networks (Fig 1A). If c > perc bulk c and c < c , then a system-spanning percolated network forms without phase separa- perc perc sat tion (Fig 1B). However, the system undergoes phase separation and a dense phase forms as a percolated droplet if c > (c , c ) and c < c < c (Fig 1C). Recent studies, using bulk sat perc sat perc dense three-dimensional lattice models designed to mimic the poly-SH3 and poly-PRM systems of Li et al. [16], show that sequence-specific effective solvation volumes of linkers / spacers between folded domains directly determine whether phase separation and percolation are coupled or if percolation occurs without phase separation for linear multivalent proteins [28, 29]. The cou- pling between phase separation and percolation is controlled by the extent to which spacers / linkers preferentially interact with the surrounding solvent. Theory [17, 24, 25, 27, 34, 53–59] and computations [28, 29, 43, 60–68] have important roles to play in modeling and describing the phase behavior of multivalent protein and RNA molecules. Theories provide analytical routes to explain experimental observations and to make testable predictions. On the other hand, simulations work around many of the simplify- ing assumptions that are needed to make theories analytically tractable. In doing so, they PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 3 / 39 Simulations of phase transitions of multivalent proteins Fig 1. Characteristic phases in the stickers and spacers formalism. (a) Dispersed solution phase where the polymers are uniformly mixed in solution. (b) Percolated fluid wherein the polymer chains form a percolated, system-spanning network through physical crosslinks among stickers results. (c) Droplet wherein network formation also causes the polymers to form condensed phases. (d) Two-dimensional representation of the LASSI architecture. The beads with arms denote stickers where arms denote that the monomers are capable of orientational interactions, and the curved lines connecting the monomers represent phantom tethers, which are allowed to freely overlap (implicit spacer model). Different colors denote different sticker and spacer species respectively. Note that the physical bonds are allowed to overlap (dashed circle). For the rest of this work, physical bonds will not be labeled and will only be depicted as overlapping orientational arms. https://doi.org/10.1371/journal.pcbi.1007028.g001 provide numerical routes to enable comparative assessments across different systems; they help in making testable predictions about phenomenology through what if calculations tar- geted toward specific systems; and they pave the way for designing systems with bespoke phase behavior. Phase transitions are collective phenomena that involve highly cooperative transitions of large numbers of multivalent polymers. The collective interactions that drive phase transitions are captured in terms of a small number of order parameters that are similar across disparate systems and represent a generic coarse-graining of the underlying system that defines parame- ters such as the correlation length and the sizes of cooperative units. Accordingly, practical considerations of computational tractability and rigorous considerations of identifying the rel- evant collective coordinates mandate the use of coarse-grained models for simulations of PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 4 / 39 Simulations of phase transitions of multivalent proteins Fig 2. Considerations that go into designing a coarse-grained model. As discussed in the text, the choice of a coarse-grained model has at least three ingredients. These include the type of conformational space (lattice or off-lattice), the nature of the interactions among entities that are represented in the coarse-grained description (isotropic, anisotropic or fluctuating fields), and the parameterization approach. LASSI, as described here, is based on a lattice model that uses anisotropic interactions and a phenomenological model. https://doi.org/10.1371/journal.pcbi.1007028.g002 phase transitions driven by multivalent protein and RNA molecules. We focus here on multi- valent proteins, although the methods we describe are readily adaptable to RNA molecules as well. Coarse-graining, an essential aspect of making simulations of large numbers of multivalent proteins a tractable proposition, comes in different flavors [69]. For simplicity, we divide con- siderations that go into the development of a suitable coarse-grained model into three catego- ries (Fig 2). These are (1) the type of model, (2) the types of interactions among the entities in the simulation, and (3) parameterization of the interaction potentials for the model of interest. Two distinct choices for the type of model are the choice between simulations being performed using lattice models versus off-lattice models. In either space, one or all of the molecules can be represented explicitly using architectures that represent coarse-grained mappings of the pro- tein of interest. Next, the interactions among the units that make up each protein can be mod- eled as being isotropic or anisotropic. This is true of simulations where proteins of interest are modeled explicitly. In contrast, numerical instantiations of field theoretic models model can also be brought to bear where only a single chain is modeled explicitly [60, 70]. The remaining protein and solvent molecules are modeled as fields whose fluctuations are concentration dependent [71]. The effects of all other molecules influence the phase behavior of the explicitly modeled single chain through interactions of the chain with the field. Finally, the choice of interaction potentials is the bedrock of every simulation. The functional forms and parameters for potentials can be derived using phenomenological considerations intended to enable calcu- lations of the “what if” variety–an approach that is common practice in statistical and polymer physics. One can also obtain system-specific parameters using information gleaned from atomistic simulations of smaller-scale facsimiles of the system of interest. These system-spe- cific parameters are derivable using force matching methods pioneered by Voth and coworkers [72–77] or by prescribing a functional form for the potential that describes interactions in the PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 5 / 39 Simulations of phase transitions of multivalent proteins coarse-grained space and employs machine learning methods to derive the relevant parameters [74, 78]. Finally, one can adopt approaches similar to the parameterization of molecular mechanics forcefields and develop a single transferable model that should be applicable to a large number of disparate systems. Different coarse-grained simulations represent different combinations of model, interac- tion type, and parameterization. Two illustrative examples for deriving coarse-grained models for simulations of phase behavior of multivalent proteins come from the works of Ruff et al. [78] and Dignon et al. [64, 65]. Ruff et al. show how one can generate off-lattice models, of bespoke resolutions and learned parameters for isotropic potentials derived using machine learning that leverages information gleaned from atomistic simulations of individual proteins and protein oligomers. Dignon et al. also use an off-lattice model based on isotropic potentials whose parameters are designed to be transferable across disparate intrinsically disordered proteins. It is worth emphasizing that at this juncture, there is no valid reason to stipulate that one combination of approaches for deriving a coarse-grained model is superior to another combi- nation. As noted by Das et al. [67, 68], all models have distinct strengths and limitations. How- ever, for specific applications, some methods afford quantifiable computational advantages over others. In our case, we are interested in uncovering conceptual nuances of phase diagrams for multicomponent systems that comprise of multivalent proteins characterized by aniso- tropic interactions among domains / motifs. As noted above, these systems can be mapped onto a stickers-and-spacers architecture. The questions we are interested in answering pertain to the order parameters that describe phase behavior, the impact of chain connectivity and spacer effective solvation volumes on phase behavior, and the determinants of the shapes of phase diagrams of multicomponent systems where phase transitions are driven by heterotypic as well as homotypic interactions. In this context, it is noteworthy that lattice models have been adapted to model phase transitions for systems comprising of different numbers of multi- valent protein and RNA molecules [28–30, 43, 79–81]. In the present work, we provide a formal description of the design and implementation of system-specific lattice models for simulating phase transitions of multivalent proteins. The simulation engine, known as LASSI for LAttice simulation engine for Sticker and Spacer Inter- actions, formalizes the approaches that have been developed and deployed in recent studies [28–30, 79, 80]. Accordingly, LASSI combines a lattice model with anisotropic interactions among stickers and the model, at least in the current formalism, is derived based on phenome- nological considerations (Fig 2). Ongoing work shows that a machine learning methodology known as CAMELOT [78] can be adapted for using LASSI as a tool to model sequence-specific phase behavior. We describe the design of LASSI, focusing first on the overall structure of the model, the Monte Carlo sampling, and their justification for generic multivalent proteins. We further describe the calculation of order parameters for quantifying phase separation and per- colation. Then, using two specific examples of linear and branched multivalent protein sys- tems, we illustrate the deployment of LASSI to two biologically relevant systems. In both systems, we make a priori assumptions regarding the identities of stickers and spacers, which is a requirement for the deployment of LASSI. Although we focus here on systems with a few components, it should be emphasized that the design of LASSI is able to handle a wide range of multicomponent systems. Materials and methods Considerations that go into the development of a suitable lattice model include (a) the choice of the mapping between a specific multivalent protein of interest and a lattice representation, PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 6 / 39 Simulations of phase transitions of multivalent proteins (b) the parameterization of the strengths and ranges of interactions for all unique pairs of beads and vacancies, (c) the design of move sets and acceptance criteria for Monte Carlo simu- lations that enable the sampling of local and collective motions of large numbers of lattice- instantiated multivalent proteins, (d) the efficient titration of key parameters such as protein concentrations and interaction strengths, and (e) the extraction of phase boundaries in terms of known and hidden collective parameters, which become the relevant order parameters for phase transitions of interest. Generating lattice representations of multivalent proteins For a given linear or branched multivalent protein, we first choose a suitable mapping between the protein degrees of freedom and a lattice representation. The conformational space is a sim- ple cubic lattice with periodic boundary conditions used to mimic a macroscopic system. Phase transitions represent the collective effects of large numbers of molecules, and simula- 3 4 tions have to include at least 10 –10 protein molecules to observe facsimiles of these collective transitions in finite sized systems [82]. Further, we need to be able to test for the effects of finite size artefacts and this requires a titration of the effects of varying the numbers of molecules. Accordingly, the lattice has to be large enough to accommodate at least 10 molecules of each type for the most dilute concentrations. Often, we might need to increase the number of mole- cules to be of O(10 ). Accordingly, a one-to-one mapping between the protein degrees of free- dom and a lattice representation would lead to a computationally intractable model. Instead, we adopt system-specific coarse-graining approaches, whereby the coarse-graining is guided by a priori rigorous or phenomenological knowledge of the identities of stickers versus spacers. For disordered proteins, the stickers within disordered regions often correspond to single amino acid residues or short linear motifs. For multivalent folded proteins, the stickers are either an entire protein domain or sectors on domain surfaces [28, 29]. Residues correspond- ing to spacers may either be modeled explicitly, where one or more spacer residues are mod- eled by a single bead on the lattice site, or be modeled as phantom tethers, where the intrinsic lengths of tethers are calibrated in terms of the numbers of lattice sites [28, 29]. In both cases, the tethers can stretch, bend, and rotate and these degrees of freedom contribute to density inhomogeneities that are the result of altered patterns of inter-sticker interactions. LASSI and bond fluctuation models The structure of LASSI is inspired by the bond fluctuation model (BFM) for lattice polymers [83]. This is a general lattice model for simulations designed to extract equilibrium conforma- tional distributions and dynamical attributes of polymers in dilute solutions as well as dense melts. There are two versions of the BFM viz., the Carmesin-Kremer BFM or CK-BFM [84] and the Shaffer BFM or S-BFM [83]. Both models are based on the use of simple cubic lattices, which discretizes the conformational space for polymers. In the CK-BFM [84], each repeating unit or monomer within a polymer is modeled as a 3-dimensional cube where the 8 corners of the cube occupy lattice sites and bond vectors con- nect pairs of monomers. Overlap of monomers is associated with an energetic penalty, and each bond vector can have up to 108 distinct directions. The choice of bond vector set encodes the geometry of the polymer and places constraints on the bond lengths and bond angles. All other interactions are governed by the inter-monomer potentials, and evolution of the system through conformational space is driven by changes to the overall potential energy. In contrast, the S-BFM places each monomer on a single lattice site. Covalently bonded monomers are connected by bonds that are constrained to be of three types, leading to chains that have bonds pffiffiffi pffiffiffi of length 1, 2 or 3 in units of lattice size. Monte Carlo moves with suitable acceptance PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 7 / 39 Simulations of phase transitions of multivalent proteins criteria can be designed for both types of BFMs. The simulations are used to generate equilib- rium conformational distributions of lattice polymers in either dilute or dense phases. The move sets control the overall polymer dynamics and the acceptance of different types of moves and the calculation of correlation functions allows one to compute dynamical quantities for lattice polymers [83]. If we were to use either of the established BFMs without modification, then each amino acid residue would be modeled as a monomer, and such an approach would be useful when the identities of stickers and spacers remain ambiguous. This approach under- lies a different simulation engine known as PIMMS [43]. LASSI is a generalization of the S-BFM that also adapts features of the CK-BFM. Given a choice of the mapping for coarse-graining, each multivalent protein is described as a chain of non-overlapping monomers viz., beads that occupy sites on a 3-dimensional cubic lattice. Note that the choice of a single site per bead is similar to that of the S-BFM, although the bead, which is a sticker or spacer monomer, need not be the monomeric unit, i.e., an amino acid res- idue in the case of proteins. Each sticker monomer is linked to its adjacent sticker on the chain via either a phantom tether or a set of spacer beads that occupy individual lattice sites [28, 29]. pffiffiffi A spacer / tether length of unity implies that adjacent monomers are within 3 lattice units of one another (Fig 1D). The choice of the spacer length will be sequence-specific or more pre- cisely, specific to the architecture of the protein of interest. Inter-monomer (sticker-sticker, sticker-spacer, and spacer-spacer) interactions are mod- eled as contact-based pairwise interactions. A sticker monomer can bind to another sticker monomer that occupies an adjacent lattice site with an interaction energy that depends on the types of both monomers. Monomers are considered to be adjacent to one another if they are pffiffiffi within a lattice distance of 3. By this criterion, each lattice site occupied by a sticker mono- mer will have 26 adjacent lattice sites. This is reminiscent of the interaction geometry of a CK-BFM for each monomer. In the current implementation of LASSI, the interactions are mutually exclusive, implying that a sticker cannot interact simultaneously with more than one other sticker, even though there are 26 adjacent sites that the interaction partner can occupy. If the sticker in question is already engaged in another inter-monomer interaction with stickers or spacers, then the unoccupied sites of the sticker will be unavailable for interaction. The combination of the geometry of the interaction sites per monomer and the single occupancy constraint leads to anisotropic interactions between sticker interactions. This feature is unique to LASSI and it is not incorporated in other variants of BFMs; this allows us to deploy LASSI for modeling heteropolymeric systems. In the context of LASSI, we note that stickers are dis- tinguished by their ability to participate in anisotropic or isotropic interactions. In contrast, explicitly modeled spacer sites only participate in isotropic interactions with other spacer or sticker sites. Furthermore, the interaction strengths involving spacers are typically weaker than those involving stickers. However, it is worth emphasizing that these distinctions only matter inasmuch as LASSI allows us to capture a numerical instantiation of the stickers-and-spacers model. For simplicity, one might simply think of LASSI as a model that has sites that are differ- entiated by whether or not they can involve themselves in anisotropic interactions, by their intrinsic site valence (a variable that we do not titrate in this work), and by the comparative magnitudes of site-site interaction strengths. Setup of simulations A system with n multivalent proteins is in reality an n+1 component system since the solvent is the implicit component. In LASSI, sites that are not occupied by protein units automatically represent solvent sites. Although the interaction potentials do not explicitly include terms between solvent and protein sites, the effective interaction strengths between pairs of protein PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1007028 October 21, 2019 8 / 39 Simulations of phase transitions of multivalent proteins units represent an averaging over protein-protein, protein-solvent, and solvent-solvent inter- actions. The solvent sites, i.e., the sites that are not occupied by protein units, represent contri- butions from the solvent to the overall translational and mixing entropies. Simulations are initiated by randomizing the positions of protein units, subject to the constraints of chain connectivity. The parameters that are set at the start of each LASSI simulation include the total number of molecules n of type i and the size of the lattice L, from which we can calculate the total num- ber n of all protein components n ¼ n and the concentration or number density of each protein c = n /L . The setup also includes stipulations for the architectures of each protein i i such as specification of the number of monomers per chain, the overall topology of each pro- tein (linear vs. branched), the lengths of spacers, and the types of spacers (implicit / phantom vs. explicit) [28–30, 79, 80]. The number of monomers per molecule will equal the sum of the number of stickers and spacers if spacer residues are modeled explicitly. Alternatively, if spac- ers are modeled as phantom tethers, then the number of explicitly modeled monomers will equal the number of stickers. Specification of the energetics of the system includes specifica- tion of the simulation temperature in normalized units, homotypic and heterotypic interaction strengths between pairs of stickers, the energetic cost for the overlap of stickers, and the inter- action strengths between sticker and spacer sites if the spacers are modeled explicitly. Design of monte carlo move sets Our goal is to compute architecture-specific phase diagrams for systems comprising of one or more types of linear or branched multivalent proteins. This requires a simulation strategy that enables the sampling of the full spectrum of coexisting densities and networked states for mul- tivalent proteins. Accordingly, the conformations of randomly initialized systems of proteins on a simple cubic lattice are sampled via a series of Markov Chain Monte Carlo (MCMC) moves that are designed to ensure efficient sampling of changes in protein density and net- working while maintaining microscopic reversibility. We have developed and deployed a col- lection of moves and these are described below. Monte carlo sampling with biases In LASSI, we have independent contributions from two main energetic sources. Monomer units are not allowed to overlap, and this can be described by a position-dependent energy E where E = 0 or1. On the other hand, inter-monomer pairwise interactions also con- pos pos tribute to the total energy, and E denotes the sum over all of the effective pairwise inter- rot monomer interaction energies. The subscript “rot”(rotational) indicates the fact that for a pair of nearest neighbor stickers their interaction energies are actually governed by their mutual orientations. Accordingly, the total system energy in a specific configuration i is written as: E ¼ E þ E ; ð1Þ i i;pos i;rot The equilibrium probability associated with configuration i is given by the Boltzmann distribu- tion as: p / expð