Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions

Nitin Bhardwaj; Hui Lu

doi:10.1016/j.febslet.2007.01.086

Loading next page...

References (41)

L.A. Mirny, M.S. Gelfand (2002)
Structural analysis of conserved base pairs in protein–DNA complexes
, 30
Y. Tsuchiya, K. Kinoshita, Haruki Nakamura (2004)
Structure‐based prediction of DNA‐binding sites on proteins Using the empirical preference of electrostatic potential and the shape of molecular surfaces
Proteins: Structure, 55
I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, Sung-Hou Kim (1999)
Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification.
Proteins, 35 4
J. Word, S. Lovell, J. Richardson, D. Richardson (1999)
Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation.
Journal of molecular biology, 285 4
Bing Ren, F. Robert, John Wyrick, O. Aparicio, E. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, Elenita Kanin, T. Volkert, Christopher Wilson, S. Bell, R. Young (2000)
Genome-wide location and function of DNA binding proteins.
Science, 290 5500
J. Thornton (1982)
Electrostatic interactions in proteins
Nature, 295
W. Gilbert, A. Maxam (1973)
The nucleotide sequence of the lac operator.
Proceedings of the National Academy of Sciences of the United States of America, 70 12
I. Kuznetsov, Zhenkun Gou, Run Li, Seungwoo Hwang (2006)
Using evolutionary and structural information to predict DNA‐binding sites on DNA‐binding proteins
Proteins: Structure, 64
Shandar Ahmad, M. Gromiha, A. Sarai (2004)
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
Bioinformatics, 20 4
L. Mirny, M. Gelfand (2002)
Structural analysis of conserved base pairs in protein-DNA complexes.
Nucleic acids research, 30 7
G. Ruvkun, F. Ausubel (1981)
A general method for site-directed mutagenesis in prokaryotes
Nature, 289
M. Brown, W. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. Furey, M. Ares, D. Haussler (2000)
Knowledge-based analysis of microarray gene expression data by using support vector machines.
Proceedings of the National Academy of Sciences of the United States of America, 97 1
T. Jaakkola, M. Diekhans, D. Haussler (1999)
Using the Fisher Kernel Method to Detect Remote Protein Homologies
Proceedings. International Conference on Intelligent Systems for Molecular Biology
S. Henikoff, J. Henikoff (1992)
Amino acid substitution matrices from protein blocks.
Proceedings of the National Academy of Sciences of the United States of America, 89 22
N. Bhardwaj, R. Stahelin, R. Langlois, W. Cho, Hui Lu (2006)
Structural bioinformatics prediction of membrane-binding proteins.
Journal of molecular biology, 359 2
Susan Jones, H. Shanahan, H. Berman, J. Thornton (2003)
Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins.
Nucleic acids research, 31 24
Michael Terribilini, Jae-Hyung Lee, Changhui Yan, R. Jernigan, Vasant Honavar, D. Dobbs (2006)
Prediction of RNA binding sites in proteins from amino acid sequence.
RNA, 12 8
Langlois (2006)
Improved protein fold assignment using support vector machines
Int. J. Bioinformat. Res. Appl., 1
Corinna Cortes, V. Vapnik (1995)
Support-Vector Networks
Machine Learning, 20
Kim Sharp (2006)
Electrostatic interactions in proteins
International Tables for Crystallography
S. Jones, David Daley, N. Luscombe, H. Berman, J. Thornton (2001)
Protein-RNA interactions: a structural analysis.
Nucleic acids research, 29 4
Vapnik (1995)
Support vector networks
Machine Learn., 20
N. Luscombe, S. Austin, H. Berman, J. Thornton (2000)
An overview of the structures of protein-DNA complexes
Genome Biology, 1
T. Schneider, G. Stormo, L. Gold, A. Ehrenfeucht (1986)
Information content of binding sites on nucleotide sequences.
Journal of molecular biology, 188 3
Shandar Ahmad, A. Sarai (2005)
PSSM-based prediction of DNA binding sites in proteins
BMC Bioinformatics, 6
B. Brooks, R. Bruccoleri, B. Olafson, D. States, S. Swaminathan, M. Karplus (1983)
CHARMM: A program for macromolecular energy, minimization, and dynamics calculations
Journal of Computational Chemistry, 4
E. Stawiski, L. Gregoret, Y. Mandel-Gutfreund (2003)
Annotating nucleic acid-binding function based on protein structure.
Journal of molecular biology, 326 4
G. Stormo, T. Schneider, L. Gold, A. Ehrenfeucht (1982)
Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli.
Nucleic acids research, 10 9
Jones (1999)
Protein–DNA interactions: a structural analysis
J. Mol. Biol., 287
G. Stormo (2000)
DNA binding sites: representation and discovery
Bioinformatics, 16 1
A. Sarai, H. Kono (2005)
Protein-DNA recognition patterns and predictions.
Annual review of biophysics and biomolecular structure, 34
I. Dubchak, I. Muchnik, C. Mayor, I. Dralyuk, Sung-Hou Kim (1999)
Recognition of a protein fold in the context of the SCOP classification
Proteins: Structure, 35
N. Bhardwaj, R. Langlois, Guijun Zhao, Hui Lu (2005)
Kernel-based machine learning protocol for predicting DNA-binding proteins
Nucleic Acids Research, 33
Changhui Yan, Michael Terribilini, Feihong Wu, Robert Jernigan, D. Dobbs, V. Honavar (2006)
Predicting DNA-binding sites of proteins from amino acid sequence
BMC Bioinformatics, 7
P. Aloy, E. Querol, F. Avilés, M. Sternberg (2001)
Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking.
Journal of molecular biology, 311 2
W. Kabsch, C. Sander (1983)
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 22
C. Ding, I. Dubchak (2001)
Multi-class protein fold recognition using support vector machines and neural networks
Bioinformatics, 17 4
W. Day, F. McMorris (1992)
Critical comparison of consensus methods for molecular sequences.
Nucleic acids research, 20 5
K. Sharp, B. Honig (1990)
Electrostatic interactions in macromolecules: theory and applications.
Annual review of biophysics and biophysical chemistry, 19
R. Langlois, A. Diec, O. Perišić, Yang Dai, Hui Lu (2005)
Improved protein fold assignment using support vector machines
International journal of bioinformatics research and applications, 1 3
Bhyravabhotla Jayaram, Kim Sharp, B. Honig (1989)
The electrostatic potential of B‐DNA
Biopolymers, 28

Publisher: Wiley
Copyright: © 2015 Federation of European Biochemical Societies
eISSN: 1873-3468
DOI: 10.1016/j.febslet.2007.01.086
Publisher site: See Article on Publisher Site

Abstract

Protein–DNA interactions are crucial to many cellular activities such as expression‐control and DNA‐repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA‐binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA‐binding sites; to improve the prediction by distance‐dependent refinement; use these predictions to identify DNA‐binding proteins. We use a support vector machines (SVM)‐based approach to harness the features of the DNA‐binding residues to distinguish them from non‐binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA‐binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post‐processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA‐binding residues can be used to differentiate DNA‐binding proteins from non‐DNA‐binding proteins with an accuracy of 78%. Results presented here demonstrate that machine‐learning can be applied to automated identification of DNA‐binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site‐directed mutagenesis and macromolecular docking.

Journal

Febs Letters – Wiley

Published: Mar 6, 2007

Keywords: ; ;

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions

Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions

Residue‐level prediction of DNA‐binding sites and its application on DNA‐binding protein predictions

References (41)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies