Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The MEMPACK alpha-helical transmembrane protein structure prediction server

The MEMPACK alpha-helical transmembrane protein structure prediction server Vol. 27 no. 10 2011, pages 1438–1439 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btr096 Structural bioinformatics Advance Access publication February 23, 2011 The MEMPACK alpha-helical transmembrane protein structure prediction server ∗ ∗ Timothy Nugent , Sean Ward and David T. Jones Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK Associate Editor: Burkhard Rost ABSTRACT the most likely topologies returned by overall likelihood and are also capable of predicting the presence of signal peptides and, in the case of Motivation: The experimental difficulties of alpha-helical MEMSAT-SVM, reentrant helices—membrane penetrating helices that enter transmembrane protein structure determination make this class of and exit the membrane on the same side, common in many ion channel protein an important target for sequence-based structure prediction families. The methods were trained using PSI-BLAST (Altschul et al., 1997) tools. The MEMPACK prediction server allows users to submit profile data generated from the Möller dataset (Möller et al., 2000), in the a transmembrane protein sequence and returns transmembrane case of MEMSAT3, or a crystal structure-based training set, in the case of topology, lipid exposure, residue contacts, helix–helix interactions MEMSAT-SVM, and achieved maximum topology prediction accuracies of and helical packing arrangement predictions in both plain text and 78% (Möller set) and 89% (crystal structure set) when fully cross-validated graphical formats using a number of novel machine learning-based using a jack knife test. The higher fraction of eukaryotic sequences in the algorithms. Möller set compared with the relative bias toward prokaryotic sequences in the crystal structure set suggest that the strong performance of these two Availability: The server can be accessed as a new component of methods makes their combination ideally suited to whole-genome annotation the PSIPRED portal by at http://bioinf.cs.ucl.ac.uk/psipred/. of alpha-helical TM proteins. Contact: d.jones@cs.ucl.ac.uk; t.nugent@cs.ucl.ac.uk Received on November 25, 2010; revised on January 27, 2011; 3 PREDICTION OF THE OPTIMAL HELICAL accepted on February 17, 2011 PACKING ARRANGEMENT Despite significant efforts to predict TM protein topology, 1 INTRODUCTION comparatively little attention has been directed toward developing a Given the biological and pharmacological importance of method to help users determine possible 3D packing arrangements transmembrane (TM) proteins and the difficulties associated for helices. Our novel tool MEMPACK (Nugent and Jones, 2009b) with obtaining their crystal structures, the use of bioinformatics uses a range of features to predict residue contacts and helix–helix approaches to direct experimental work while furthering our interactions before using this information to predict the optimal understanding of their structure and function is essential. The helical packing arrangement. First, an SVM classifier, trained using MEMPACK prediction server applies a selection of machine lipid exposed residue profiles labelled according to molecular learning-based tools to predict TM topology—the total number dynamics simulation data (Sansom et al., 2008), is used to predict of TM helices, their boundaries and in/out orientation relative per residue lipid exposure. This information is then combined with to the membrane—with the addition of lipid exposure, residue PSI-BLAST profile data for each interacting residue and additional contacts, helix–helix interactions, culminating in prediction of sequence-based features as input data for an SVM to predict the optimal helical packing arrangement using a force-directed residue contacts. Combining these results with predicted topology algorithm. Figure 1 provides an example of some of the server information, helix–helix interactions can then be predicted and used output. The underlying tools have recently been shown to provide to optimally arrange the helices using a graph-based approach. significant improvements in prediction accuracy compared with By employing a force-directed algorithm, the method attempts to existing methods. It is hoped that this service will be of benefit to minimize edge crossing while maintaining uniform edge length, the broader scientific community. attributes common in native structures. Finally, a genetic algorithm is used to rotate helices in order to prevent residue contacts occurring across the longitudinal helix axis. Under stringent cross-validation 2 METHODS on a non-redundant test set of 74 protein chains, the method In order to predict TM protein topology, the server employs the MEMSAT3 achieved 70% lipid exposure and 67% helix–helix interaction (Jones, 2007) and MEMSAT-SVM (Nugent and Jones, 2009a) methods prediction accuracy—both significant improvements over existing which are based on neural network and SVM classifiers, respectively. methods—and was able to produce a helical packing arrangement Both methods use a dynamic programming algorithm to return a list of which closely resembled a 2D slice taken from the crystal structure approximately normal to the likely plane of the lipid bilayer in 14 To whom correspondence should be addressed. 1438 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [15:41 19/4/2011 Bioinformatics-btr096.tex] Page: 1438 1438–1439 Alpha-helical transmembrane protein structure out of 23 cases, where all helix–helix interactions were successfully predicted. Of the remaining 51 cases, 34 were partially predicted while 17 had no predicted interactions, highlighting the challenges that remain for helix–helix interaction prediction in TM proteins. Funding: Part of this work was supported by the BioSapiens project, which is funded by the European Commission within its FP6 Programme, under the thematic area ‘Life sciences, genomics and biotechnology for health’ (contract number LSHG-CT-2003- 503265). Funding was also provided by the Biotechnology and Biological Sciences Research Council and the Wellcome Trust (grant number GR066745MA). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the article. Conflict of Interest: none declared. REFERENCES Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. Jones,D.T. (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics, 23, 538–544. Möller,S. et al. (2000) A collection of well characterised integral membrane proteins. Bioinformatics, 16, 1159–1160. Nugent,T. and Jones,D.T. (2009a) Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics, 10, 159. Nugent,T. and Jones,D.T. (2009b) Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm. PLoS Comput. Biol., 6, e1000714. Sansom,M.S. et al. (2008) Coarse-grained simulation: a high-throughput computational approach to membrane proteins. Biochem Soc. Trans., 36, 27–32. Fig. 1. Sample output for Archaerhodopsin-1, showing predicted transmembrane regions via MEMSAT and MEMSAT-SVM, the MEMSAT- SVM helix orientation cartoon and the predicted helical packing arrangement from MEMPACK. The plots underneath the schematic topology diagram show the raw scores generated by the SVMs that distinguish between TM helices and loop regions (H/L), inside loops and outside loops (iL/oL), reentrant loops or non-reentrant loops (RE/!RE) and signal peptides or non-signal peptides (SP/!SP). Colors in the MEMPACK cartoon indicate hydrophobic residues (blue), polar residues (red) and charged residues (green for negative, purple for positive). Lines between residues indicate a predicted interaction. [15:41 19/4/2011 Bioinformatics-btr096.tex] Page: 1439 1438–1439 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

The MEMPACK alpha-helical transmembrane protein structure prediction server

Bioinformatics , Volume 27 (10): 2 – Feb 23, 2011

Loading next page...
 
/lp/oxford-university-press/the-mempack-alpha-helical-transmembrane-protein-structure-prediction-W7O0s0Dpv1

References (6)

Publisher
Oxford University Press
Copyright
© The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btr096
pmid
21349872
Publisher site
See Article on Publisher Site

Abstract

Vol. 27 no. 10 2011, pages 1438–1439 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btr096 Structural bioinformatics Advance Access publication February 23, 2011 The MEMPACK alpha-helical transmembrane protein structure prediction server ∗ ∗ Timothy Nugent , Sean Ward and David T. Jones Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK Associate Editor: Burkhard Rost ABSTRACT the most likely topologies returned by overall likelihood and are also capable of predicting the presence of signal peptides and, in the case of Motivation: The experimental difficulties of alpha-helical MEMSAT-SVM, reentrant helices—membrane penetrating helices that enter transmembrane protein structure determination make this class of and exit the membrane on the same side, common in many ion channel protein an important target for sequence-based structure prediction families. The methods were trained using PSI-BLAST (Altschul et al., 1997) tools. The MEMPACK prediction server allows users to submit profile data generated from the Möller dataset (Möller et al., 2000), in the a transmembrane protein sequence and returns transmembrane case of MEMSAT3, or a crystal structure-based training set, in the case of topology, lipid exposure, residue contacts, helix–helix interactions MEMSAT-SVM, and achieved maximum topology prediction accuracies of and helical packing arrangement predictions in both plain text and 78% (Möller set) and 89% (crystal structure set) when fully cross-validated graphical formats using a number of novel machine learning-based using a jack knife test. The higher fraction of eukaryotic sequences in the algorithms. Möller set compared with the relative bias toward prokaryotic sequences in the crystal structure set suggest that the strong performance of these two Availability: The server can be accessed as a new component of methods makes their combination ideally suited to whole-genome annotation the PSIPRED portal by at http://bioinf.cs.ucl.ac.uk/psipred/. of alpha-helical TM proteins. Contact: d.jones@cs.ucl.ac.uk; t.nugent@cs.ucl.ac.uk Received on November 25, 2010; revised on January 27, 2011; 3 PREDICTION OF THE OPTIMAL HELICAL accepted on February 17, 2011 PACKING ARRANGEMENT Despite significant efforts to predict TM protein topology, 1 INTRODUCTION comparatively little attention has been directed toward developing a Given the biological and pharmacological importance of method to help users determine possible 3D packing arrangements transmembrane (TM) proteins and the difficulties associated for helices. Our novel tool MEMPACK (Nugent and Jones, 2009b) with obtaining their crystal structures, the use of bioinformatics uses a range of features to predict residue contacts and helix–helix approaches to direct experimental work while furthering our interactions before using this information to predict the optimal understanding of their structure and function is essential. The helical packing arrangement. First, an SVM classifier, trained using MEMPACK prediction server applies a selection of machine lipid exposed residue profiles labelled according to molecular learning-based tools to predict TM topology—the total number dynamics simulation data (Sansom et al., 2008), is used to predict of TM helices, their boundaries and in/out orientation relative per residue lipid exposure. This information is then combined with to the membrane—with the addition of lipid exposure, residue PSI-BLAST profile data for each interacting residue and additional contacts, helix–helix interactions, culminating in prediction of sequence-based features as input data for an SVM to predict the optimal helical packing arrangement using a force-directed residue contacts. Combining these results with predicted topology algorithm. Figure 1 provides an example of some of the server information, helix–helix interactions can then be predicted and used output. The underlying tools have recently been shown to provide to optimally arrange the helices using a graph-based approach. significant improvements in prediction accuracy compared with By employing a force-directed algorithm, the method attempts to existing methods. It is hoped that this service will be of benefit to minimize edge crossing while maintaining uniform edge length, the broader scientific community. attributes common in native structures. Finally, a genetic algorithm is used to rotate helices in order to prevent residue contacts occurring across the longitudinal helix axis. Under stringent cross-validation 2 METHODS on a non-redundant test set of 74 protein chains, the method In order to predict TM protein topology, the server employs the MEMSAT3 achieved 70% lipid exposure and 67% helix–helix interaction (Jones, 2007) and MEMSAT-SVM (Nugent and Jones, 2009a) methods prediction accuracy—both significant improvements over existing which are based on neural network and SVM classifiers, respectively. methods—and was able to produce a helical packing arrangement Both methods use a dynamic programming algorithm to return a list of which closely resembled a 2D slice taken from the crystal structure approximately normal to the likely plane of the lipid bilayer in 14 To whom correspondence should be addressed. 1438 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [15:41 19/4/2011 Bioinformatics-btr096.tex] Page: 1438 1438–1439 Alpha-helical transmembrane protein structure out of 23 cases, where all helix–helix interactions were successfully predicted. Of the remaining 51 cases, 34 were partially predicted while 17 had no predicted interactions, highlighting the challenges that remain for helix–helix interaction prediction in TM proteins. Funding: Part of this work was supported by the BioSapiens project, which is funded by the European Commission within its FP6 Programme, under the thematic area ‘Life sciences, genomics and biotechnology for health’ (contract number LSHG-CT-2003- 503265). Funding was also provided by the Biotechnology and Biological Sciences Research Council and the Wellcome Trust (grant number GR066745MA). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the article. Conflict of Interest: none declared. REFERENCES Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. Jones,D.T. (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics, 23, 538–544. Möller,S. et al. (2000) A collection of well characterised integral membrane proteins. Bioinformatics, 16, 1159–1160. Nugent,T. and Jones,D.T. (2009a) Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics, 10, 159. Nugent,T. and Jones,D.T. (2009b) Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm. PLoS Comput. Biol., 6, e1000714. Sansom,M.S. et al. (2008) Coarse-grained simulation: a high-throughput computational approach to membrane proteins. Biochem Soc. Trans., 36, 27–32. Fig. 1. Sample output for Archaerhodopsin-1, showing predicted transmembrane regions via MEMSAT and MEMSAT-SVM, the MEMSAT- SVM helix orientation cartoon and the predicted helical packing arrangement from MEMPACK. The plots underneath the schematic topology diagram show the raw scores generated by the SVMs that distinguish between TM helices and loop regions (H/L), inside loops and outside loops (iL/oL), reentrant loops or non-reentrant loops (RE/!RE) and signal peptides or non-signal peptides (SP/!SP). Colors in the MEMPACK cartoon indicate hydrophobic residues (blue), polar residues (red) and charged residues (green for negative, purple for positive). Lines between residues indicate a predicted interaction. [15:41 19/4/2011 Bioinformatics-btr096.tex] Page: 1439 1438–1439

Journal

BioinformaticsOxford University Press

Published: Feb 23, 2011

There are no references for this article.