Abstract Motivation Water molecules in protein binding sites play essential roles in biological processes. The popular 3D-RISM prediction method can calculate the solvent density distribution within minutes, but is difficult to convert it into explicit water molecules. Results We present GAsol, a tool that is capable of finding the network of water molecules that best fits a particular 3D-RISM density distribution in a fast and accurate manner and that outperforms other available tools by finding the globally optimal solution thanks to its genetic algorithm. Availability and implementation https://github.com/accsc/GAsol. BSD 3-clauses license Contact firstname.lastname@example.org Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction The function of the water molecules in the binding sites of the proteins has become of considerable interest recently. It is well known that water plays a key role in ligand recognition and in stabilizing protein structures. In order to complement experimental techniques and to improve our understanding of active-site hydration, several computational approaches have been developed during the years (Bodnarchuk, 2016). Some of the most popular methods to locate water molecules in protein binding sites include WaterMap (Abel et al., 2008), GIST (Nguyen et al., 2012) and the Three-Dimensional Reference Interaction Site Model (3D-RISM) (Beglov and Roux, 1997), to name a few. During this time, it has become clear that the water predicting tools can have a significant impact on medicinal chemistry programs. One recent example includes the development of inhibitors of platelet-derived growth factor receptor β (Horbert et al., 2015) Of particular relevance here, 3D-RISM (Kovalenko and Hirata, 1999) is a computational approach that calculates the distribution of solvent molecules around a solute and which has its roots in statistical mechanical integral equation theories (IET) of liquids. Most popular 3D-RISM implementations can calculate the solvent distribution around a rigid solute within minutes, using only the solute structure and the solvent composition as input. To address the difficulty to convert the continuous distribution function of 3D-RISM into explicit water molecules some algorithms have been developed, e.g. Placevent (Sindhikara et al., 2012), but either they present some deficiencies regarding finding a truly global solution or they cannot be applied easily to a wide range of targets. Here we present GAsol, a tool that is capable of finding the network of water molecules that best fits a particular 3D-RISM density distribution in a fast and accurate manner. 2 Methods and application GAsol addresses the search for the optimal network of water molecules from a global point of view starting from the 3D-RISM solvent density. It uses a genetic algorithm and a desirability function (Fig. 1A) to avoid local minima problem (Fig. 1B, Supplementary Tables S2–S3). The analysis can be carried out typically in a couple of minutes on modern workstations due to the built-in multiprocessor capabilities, and only requires a grid file in DX format. The resulting network is written to a PDB file that can be visualized with any standard molecular viewer. Fig. 1. View largeDownload slide (A) General overview of the GAsol algorithm. (B) Example of misprediction of two water molecules due to a local minima problem (PDB ID 5I80). The water predicted by GAsol is reported in green. (C) Overlay of the highly conserved water network in the 184 bromodomains BRD4-BD1. (D) Results of the validation procedure on BRD4 crystals versus Placevent (standard algorithm) (Color version of this figure is available at Bioinformatics online.) Fig. 1. View largeDownload slide (A) General overview of the GAsol algorithm. (B) Example of misprediction of two water molecules due to a local minima problem (PDB ID 5I80). The water predicted by GAsol is reported in green. (C) Overlay of the highly conserved water network in the 184 bromodomains BRD4-BD1. (D) Results of the validation procedure on BRD4 crystals versus Placevent (standard algorithm) (Color version of this figure is available at Bioinformatics online.) 2.1 Detecting potential water sites The number of water sites to consider in the optimization process is a critical parameter of the algorithm. We have implemented a double filter procedure that, first, uses a minimum threshold value for the density distribution to consider a grid point as a potential water site (by default g(r) ≥ 5) and second, a spatial constraint in the form of a sphere with user supplied centre and radius, to consider only grid points inside the defined region (e.g. binding site). To facilitate this process, the program allows users to specify a ligand of interest in PDB format to automatically set the centre of the region to the geometrical centre of the molecule. 2.2 Genetic algorithm After selecting the potential water sites, the algorithm initializes a population of individuals of potential solutions to the problem (chromosomes). Each chromosome is made of multiple genes, as many as water sites are available. Each gene is set to a value of 1, meaning the site is occupied by a water molecule or to 0, meaning that the site is empty. The initial population is then evolved during a total of 10 000 generations. In each generation, the population is subjected to selection, crossover and mutation. The selection procedure chooses individuals in the current generation with a tournament scheme. In this tournament, three individuals are selected randomly, allowing repetition, and only the best solution is allowed to reproduce. In the crossover phase, these individuals are mated by combining their chromosomes defining two random crossover points. Finally, in the mutation step, random gene flips are introduced in the offspring with a low probability to add variability. 2.3 Desirability function Before the algorithm starts to generate solutions, the density distribution from the 3D-RISM calculation is transformed to a population function by using the equation Pr⃗= ρbulkVvoxelg(r⃗) where ρbulk is the density of the bulk solvent, Vvoxel is the volume of one voxel in the grid and g(r) is the density function. Following, for each water site detected in the first phase of the program, we calculate the minimum number of voxels required to account for one unit of the population. Each water site is then scored by dividing the final population value (which should be around 1.0) by the radius of the sphere calculated. This scoring method guarantees that water sites with more compact populations, and therefore more likely, are selected preferentially. To score individual solutions, we have introduced a desirability function with two subcomponents and one penalty term (Supplementary Information). The first subcomponent accounts for the amount of population considered for a particular solution by summing all the individual values for each occupied water site and normalizing by the sum of the values for all water sites in the solution space (occupied or not). The second subcomponent tries to avoid double-counting the same part of the population multiple times in the case of proximal water sites. The function has a value of 1 by default except when two or more occupied water sites are at a distance of less than a threshold, which sets the value to 0. A penalty term has been introduced to improve the efficiency of the algorithm regarding the second subcomponent. As the desirability of the non-feasible solutions is always 0, the algorithm tends to waste several initial iterations since the random solutions usually contain several incompatible occupied water sites. The penalty term is defined then as the weighted ratio of the number of incompatible water sites and the total number of sites in the chromosome. 2.4 Evaluation datasets and results To validate the tool we have selected a dataset of X-ray crystal ligand-proteins complexes with confirmed water networks that includes the HIV-1 protease (PDB 2ZYE), neuraminidase (PDB 1NNC), bovine pancreatic trypsin (PDB 5PTI) and a series of 184 BRD4 bromodomain 1 (BRD4-BD1) complexes to evaluate the robustness of the algorithm to small changes in the binding site (Supplementary Table S1) in a highly conserved water network (Fig. 1C). As a metric, we have used the number of water molecules predicted within a distance of 2.0 A from the crystallographic position (Supplementary Fig. S1). The tool can detect all the water molecules around the ligands in the HIV-1 protease, neuraminidase and in the bovine pancreatic trypsin systems. For the 184 BRD4-BD1 complexes, GAsol identifies correctly 94.3% (Supplementary Fig. S1) of the key water molecules of the complexes with an improvement of the results of 90% if compared to a standard tool (Fig. 1D). Moreover, the number of false positive defined as the number of predicted water molecule not matching a crystallographic one is comparable between GAsol and Placevent (Supplementary Fig. S2). Conflict of Interest: none declared. References Abel R. et al. . ( 2008 ) Role of the active-site solvent in the thermodynamics of factor Xa ligand binding . J. Am. Chem. Soc ., 130 , 2817 – 2831 . Google Scholar CrossRef Search ADS PubMed Beglov D. , Roux B. ( 1997 ) An integral equation to describe the solvation of polar molecules in liquid water . J. Phys. Chem. B , 101 , 7821 – 7826 . Google Scholar CrossRef Search ADS Bodnarchuk M.S. ( 2016 ) Water, water, everywhere … It's time to stop and think . Drug Discov. Today , 21 , 1139 – 1146 . Google Scholar CrossRef Search ADS PubMed Horbert R. et al. . ( 2015 ) Optimization of potent DFG-in inhibitors of platelet derived growth factor receptorβ (PDGF-Rβ) guided by water thermodynamics . J. Med. Chem ., 58 , 170 – 182 . Google Scholar CrossRef Search ADS PubMed Kovalenko A. , Hirata F. ( 1999 ) Potential of mean force between two molecular ions in a polar molecular solvent: a study by the three-dimensional reference interaction site model . J. Phys. Chem. B , 103 , 7942 – 7957 . Google Scholar CrossRef Search ADS Nguyen C.N. et al. . ( 2012 ) Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit  uril . J. Chem. Phys ., 137 , 044101 . Google Scholar CrossRef Search ADS PubMed Sindhikara D.J. et al. . ( 2012 ) Placevent: An algorithm for prediction of explicit solvent atom distribution—Application to HIV‐1 protease and F‐ATP synthase . J. Comput. Chem ., 33 , 1536 – 1543 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Bioinformatics – Oxford University Press
Published: Jan 15, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera