Access the full text.
Sign up today, get DeepDyve free for 14 days.
The terms like e-science, e-poster or e-health are nowadays commonly used. Special disciplines allowing fast development in these fields of science are commonly available. This paper presents e-paper  powered by the Collage Authoring Environment  e-publication system which is backed by the GridSpace2  distributed computing platform. This e-publication in a form of WWW page, apart from the traditional textual and graphical content, embeds an on-line software tool for the analysis of the 3-D structure of protein based on the hydrophobicity distribution in protein body. The tool uses GridSpace2 platform in order to carry out computations on the PL-Grid  high-performance computing infrastructure. This work shows how this specific epublication was accomplished utilizing above mentioned already existing information technologies and e-infrastructure. The tool employs the model called "fuzzy oil drop" that assumes the hydrophobicity distribution in proteins being in form of 3-D Gauss function. The protein of the hydrophobicity core structure accordant with the model with all hydrophobic residues buried in the central part of the protein body and hydrophilic residues exposed toward the water environment could be the protein very well soluble although representing no any form of activity. This is why the observed discrepancies between idealized and observed hydrophobicity distribution is presented in form of profile revealing the localization of residues representing local hydrophobicity excess as well as local hydrophobicity deficiency. The distribution of these discrepancies appeared to be specific and function related. The e-publication makes available the tool to calculate the profile of any protein under consideration. The interpretation of the final results is specific for particular protein. KEYWORDS: hydrophobicity distribution, tertiary structure of proteins, e-publication, virtual laboratory Introduction The computational techniques are the basis for dynamically developing discipline which is bioinformatics. New tools able to calculate the parameters necessary to develop new models require to characterize the large data sets. Large scale computing is necessary to verify newly introduced techniques. The large number of proteins of known 3-D structure on one site and the large scale computing facilities give the opportunity to develop many edisciplines. The e-posters presenting the results of e-science including also the e-health research are popular nowadays. This paper presents the epublication as implemented by the DICE Team  at Academic Computer Center AGH, as one of the pilot e-publications powered by the Collage Authoring Environment and GridSpace2 distributed computing platform developed by the DICE Team in the scope of the PL-Grid project. The E-publication has limited applicability. This type of presenting results is applicable solely to publications of the category called "tools". It is possible to present the new tool to calculate characteristics of the protein structure in particular. The aim of such paper is the presentation of the tool in action. This paper presents the tool oriented on the calculation of "fuzzy oil drop" structure of particular protein. The results are given in form of the profile the difference between theoretical (idealized) and observed hydrophobicity distribution. The interpretation of the profile is dependent on the protein under consideration. The active form of e-publication makes possible the individual calculation for any protein molecule. The method of results interpretation is common for all proteins although some individual phenomena (biological specificity) can be found in particular protein. In consequence the Author receives his/her own results and adds the specific characteristics revealed by the specific profile. Materials and Methods GridSpace2 Virtual Laboratory and Collage Authoring Environment Virtual Laboratory notion has been introduced in order to describe an interactive software platform that establishes a virtual environment for creating and conducting simulated experiments. Information technologies enable virtual environment in which whole groups of geographically distributed scientists work together on common problems. GridSpace2 is one of such solutions: it's a novel distributed computing platform aimed at scientific computations. It enables researchers to author, run, share and publish so called virtual experiments over Grid-based resources and other computing infrastructures. GridSpace2 facilitates exploratory development of virtual experiments  by means of scripts which can be expressed in an open set of popular general-purpose languages (as Ruby, Python or Perl) as well as more tailored domain or purpose-specific tools and notations (such as Matlab, Mathematica or GnuPlot). The set is open so that nearly any kind of software package, even custom ones, can be seamlessly incorporated into the platform. Therefore, scientists are enabled to easily assembly their codes written in a variety of programming languages that support them on subsequent stages of their research process, ranging from data collection and data pre-processing routines through in-silico simulations to data postprocessing, analysis and visualization. Virtual experiments can be then embedded on arbitrary WWW sites enabling e-publication authors to combine on-line article with executable chunks of codes and data items of virtual experiments. Thus, GridSpace2 reduces common hassle that is inherent when dealing with distributed and high performance computing environment by the collaborating distributed scientific communities. The architecture of GridSpace2 depicted in Figure 1 was designed to address all those inherent aspects of distributed computing. The first aspect targeted by the GridSpace2 platform is accessing multiple computing infrastructures. The platform conceals its complexity by the abstraction layer of Executors. High-level Executor API enables incorporation of a variety of computing power providers starting from single hosts, through remote hosts, clusters and data centers to grids and clouds. The system is provider-agnostic so no matter if the power is managed by campus, academia, industry or commercial parties the GridSpace2 can span a single virtual environment over all of them. Above such defined abstraction layer experiments stay computing infrastructure-agnostic in Portable Experiment Format, thus no additional effort is required to adapt its code to the specific distributed computing environment. The other aspect covered by the GridSpace2 platform emerges from a need of sharing, publishing and reusing experiments amongst the coresearchers, within teams, collaborating communities or public domain. Owing to GridSpace2 Web Layer the experiments can be authored and run using web application of Experiment Workbench, or embedded on arbitrary web page using interactive Collage Widgets. Once written, experiments can easily be made available to the specified group of users without additional burden of installation, porting, recompiling on end-users' workstations. As long as end-users are granted the accounts to use the computing infrastructure that hosts the experiment, it can be used by anyone equipped with sole web browser. Figure 1. Architecture of GridSpace2 platform: GridSpace2 Executor API conceals underlying computing infrastructure thus leveraging abstraction layer for infrastructure-agnostic, portable experiments; GridSpace2 Web Layer enables accessing the experiments through the web application of Experiment Workbench and, in particular, accessing experiment items such as code fragments and data files through HTTP protocol from web browsers. In this way, experiment items can be embedded on an arbitrary web page. Collage Authoring Environment  enables authors to export experiments in a form of web widgets that can be embedded on arbitrary web pages. Experiment items either executable code snippets or input and output files can be mashed-up with HTML content using respective embed codes. In this way, any HTML-based content like wikis, blogs, Content Management Systems or custom web sites can be enriched with interactive Collage Widgets that enable running and in-place editing of the experiment as well as accessing its input or output files. Being a web-oriented platform GridSpace2 offers a convenient way for accessing data files stored on distributed computing resources though HTTP protocol, thus making experiment input and result data even more pervasively accessible via web browsers. Aside the above mentioned most significant features of the platform, GridSpace2 handles further yet not less important aspects as confidential, authenticated and authorized access to sensitive research data, accounting of utilized resources etc. Using the GridSpace2 platform presented above, the executable virtual experiment implementing the "fuzzy oil drop" method, described further in this work, was developed. "Fuzzy Oil Drop" Model The "fuzzy oil drop" model is described in details in . The short recollection of the idea is as follows: The structure of hydrophobic core is assumed to be described by 3-D Gauss function. It means that the highest hydrophobicity density is localized in the central part. The hydrophobicity density decreases together with the distance versus the center of the ellipsoid reaching the close to zero level on the surface of the ellipsoid of the size expressed by three independent (standard deviation) for each direction. = -( - )2 ) 1 -(- 2 -(- )2 exp ( ) exp ( 2 2 ) exp ( 2 2 ) 2 2 (1) This idealized hydrophobicity density is compared with the experimentally observed. The experimentally observed hydrophobicity density distribution is calculated according to pair-wise hydrophobic interaction between side chains of amino acids (according to Levitt function ). The three versions of expressing the inter-residual hydrophobic interaction are as follows: 4 6 8 r 2 r r r 1 1 7 ij 9 ij 5 ij ij for rij c 1 ~ Ho j ~ ( H ir H rj ) 2 c c c c Ho sum i 1 0 for rij c N (2) r 2 r 1 1 7 ij 9 ij 1 ~ Ho j ~ ( H ir ) 2 c c Hosum i 1 0 for rij c r 5 ij c rij c for r c ij (3) r 2 r 1 1 7 ij 9 ij 1 N ~o r r c H j [ H j ( H i ) 2 c ~ Hosum i 1 0 for rij c r 5 ij c rij c ]for r c ij (4) The comparison of given equations reveals the difference in treating the hydrophobicity of interacting residues. The sum of hydrophobicity of both interacting residues are treated equally both hydrophobicity values are multiply by the coefficient expressing the distance dependence. The second equation takes into account only the hydrophobicity of the surrounding residues (in respect to j-th residue). The third version multiplies the hydrophobicity of surrounding residues (all i-th residues) by the distance-dependent coefficient keeping the hydrophobicity of the j-th residues constant. The idealized and observed hydrophobicity density are normalized (the division by the sum of hydrophobicity density of all residues). It makes possible the calculation of differences for each residue: ~ ~ ~ (5) H i Ht i Hoi The observed distribution differs versus the idealized one revealing the local hydrophobicity excess as well as local hydrophobicity deficiency. The first one occurs when the hydrophobic residue is exposed to the water environment. Such area is treated as potentially ready to interact with other protein molecule. The second one hydrophobicity deficiency is treated as identification of cavity ready to interact with hydrophobic ligand or substrate. In this paper the protein 4HHB the chain of hemoglobin was taken as the example. The -chain interacts with the ligand hem and with two -chains and the second -chain. The influence of these external factors can be interpreted on the results in form of profile. Results The experiment presented in this work was authored using the GridSpace2 Experiment Workbench web application (shown in Figure 2). It offers File Management Panel where one can manage files hosted on the remote computing infrastructures, Experiment Panel, where development of the experiment takes place, and Console Area where the output of experiment is displayed. In the Experiment Panel experiments can be assembled from code fragments called snippets that are to be interpreted by chosen interpreter, as well as from input and output files called assets that are to be processed and produced by the snippets, respectively. Figure 2. User interface of Experiment Workbench: File Management Panel, Experiment Panel, and Console Area. The experiment consists of several pipelined snippets, each processing its input and producing its output files that form a graph of interconnected files and snippets. Code snippets are interpreted by the interpreters executed on Zeus high-performance computing cluster that is incorporated into the PLGrid e-infrastructure . Therefore, the experiment can be accessed and run by wide community of PL-Grid users that any scientist affiliated to polish research institute can join along with his or her research fellows through PLGrid Portal . For each experiment item, either snippet or file, the special HTML code, called embed code, can be generated and used to embed corresponding Collage Widget on an arbitrary web site as shown in Figure 3. Figure 3. Experiment and its computation results being exported in a form of Collage Widgets corresponding to each experiment item: code snippets, input and output assets. The experiment being discussed here was embedded on a web page containing the on-line version of the publication, thus enriching its traditional textual and graphical content with an interactive and executable experiment that readers can easily re-perform in order to either reproduce the original results provided by the author, or obtain new results for the input data and parameters provided by themselves. Moreover, readers are given the opportunity to have a deeper insight into the method implementation as its code is exposed to the readers who can even modify it. The e-publication being discussed here is hosted on GridSpace2 Executable Papers Portal  that is based on WordPress  blog platform. Executable Papers Portal is made public, so anyone can read the text content of the articles, however, all interactive and executable Collage Widgets are displayed only after prior logging into the experiment executor, Zeus cluster in this case, thus available only to PL-Grid users. The e-publication is made available as shown in Figure 4. Using the e-publication the protein of reader's choice can become a subject of computations, by specifying the identifier that it is stored under in the PDB  database. In the next paragraph we discuss the result obtained for hemoglobin (PDB identifier 4HHB). Figure 4. The e-publication hosted on GridSpace2 Executable Papers Portal made available for PL-Grid users (screenshot of a fragment). The three profiles expressing the hydrophobic interaction as it is distributed in chain of hemoglobin are presented in Figure 5. The e-publication makes possible automatic calculation of theoretical and observed distribution of hydrophobicity in any protein under consideration. The profile of profile helps identification of residues representing local hydrophobicity maxima (hydrophobicity deficiency) and residues representing local hydrophobicity excess (local minima). The profile of theoretical and observed hydrophobicity distribution is the final result of e-publication calculation. The general interpretation of the profile is given above. However the interpretation and biological meaning of the extreme values depends on the individual characteristics of given protein. In the presented case of hemoglobin chain shows the relation between ligand binding as well as protein-protein interaction in respect to profile revealing the influence of interaction with external factors on the structure of "fuzzy oil drop" model in this protein. Figure 5. The hydrophobicity profile as it appears in chain of hemoglobin in the proteins deposited in PDB as 4HHB: theoretical (red), observed (green) and (theoretical observed blue) for the protein 4HHB taken as example. Three different forms of formula expressing the hydrophobic interaction are presented. The color fragments on the horizontal axis design as follows: red fragments residues engaged in ligand binding, dark blue fragments residues engaged in protein-protein interaction. The presentation given in Figure 5 allows the observation of the relation between values and the interaction status complexation of ligand or protein molecule. It makes possible verification of the hypothesis concerning the specific hydrophobic core deformation with specific biological activity of the protein under consideration. Conclusions Although existing model of publishing is arguably sufficient when browsing for a new content, modern researcher would like to take a deeper insight into the method implementation, easily reproduce the results, experiment with it, and apply in his or her own research. Accessibility, reproducibility and reusability can greatly accelerate scientific research by promoting collaborative sharing and widespread dissemination of not only results but the executable codes as well. GridSpace2 platform enhanced with Collage Authoring Environment offer a complete environment that addresses above mentioned issues and enables new kind of experience when dealing with scientific publications. E-publications can be seen as a publication of the category called "tools" in some journals. The e-publication discussed in this work allows for receiving the hydrophobicity profile for any protein deposited in PDB. The general interpretation of the profile is given. Although the detailed interpretation taking into account the specific feature of the protein in the respect to its biological activity is dependent on the author's invention. In our case the added interpretation of the characteristics of residues representing the local maxima or local minima in respect to ligand binding and/or proteinprotein interaction. The engagement of residues in such activity in relation to profile is one of many possible individual interpretation of received results. The aim of this paper is the presentation of e-publication as it is available in PL-Grid system. The detailed biological interpretation of the results is beyond of this paper. The wide discussion of the "fuzzy oil drop" applicability is presented in . The e-publication in the form presented here introduces the dynamic form of results generation with general interpretation supported.
Bio-Algorithms and Med-Systems – de Gruyter
Published: Dec 1, 2012
Access the full text.
Sign up today, get DeepDyve free for 14 days.