Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

TANDEM: matching proteins with tandem mass spectra

TANDEM: matching proteins with tandem mass spectra Vol. 20 no. 9 2004, pages 1466–1467 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/bth092 TANDEM: matching proteins with tandem mass spectra 1 1,2,∗ Robertson Craig and Ronald C. Beavis Manitoba Centre for Proteomics, University of Manitoba, Winnipeg, MB, Canada R3T 2N2 and Institute for Biophysical Dynamics, University of Chicago, Chicago, IL 60637, USA Received on September 26, 2003; accepted and revised on December 2, 2003 Advance Access publication February 19, 2004 ABSTRACT for every known peptide sequence that could be generated Summary: Tandem mass spectra obtained from fragmenting from the known proteome of the organism under investig- peptide ions contain some peptide sequence specific informa- ation (Rappsilber and Mann, 2002). This approach has the tion, but often there is not enough information to sequence advantage that it only requires enough information to rank the original peptide completely. Several proprietary software peptides in a proteome. The necessity to be able to call even applications have been developed to attempt to match the short stretches of contiguous sequence is removed, making it spectra with a list of protein sequences that may contain possible to use spectra that cannot be manually interpreted in the sequence of the peptide. The application TANDEM was the usual manner. written to provide the proteomics research community with For any complete proteome, there are too many potential a set of components that can be used to test new methods peptide sequences to consider performing this type of correl- and algorithms for performing this type of sequence-to-data ation manually. Several software implementations have been matching. developed to carry out this process, but at the moment they Availability: The source code and binaries for this software are all proprietary and the source code is not available (e.g. are available at http://www.proteome.ca/opensource.html, for Perkins et al., 1999), which has limited the development of this Windows, Linux and Macintosh OSX. The source code is made type of technology. Our group has designed and implemented available under the Artistic License, from the authors. an open-source project that can be used for developing new Contact: rbeavis@proteome.ca algorithms to improve the results and efficiency of matching peptide sequences to MS/MS spectra. This project is called A significant branch of research in peptide mass spectro- ‘TANDEM’. metry in the last 30 years has focused on experimentally TANDEM was written to run from a command line, with determining peptide sequences using tandem mass spectro- an input XML file name as the only command line parameter. metry (MS/MS). The fundamental idea is as follows: isolate The code was created using a set of classes that perform the a particular intact peptide parent ion with one mass spectro- following tasks: meter; add electronic and vibrational energy to the isolated (1) read XML input parameter files; ion; and observe the resulting fragment ions with another mass spectrometer (Aebersold and Mann, 2003). If the res- (2) read protein sequences from FASTA files; ulting fragment ion spectrum contains signals that can be (3) read MS/MS spectra in common ASCII formats (DTA, correlated with bond-breaking reactions along the peptide PKL and Matrix Science); backbone, then the sequence of the peptide should be cal- (4) condition MS/MS spectra to remove noise and common culable because of the nearly unique masses associated with artifacts; each side chain (leucine and isoleucine are isobaric and cannot (5) process peptide sequences with cleavage reagents, post- be distinguished by this type of experiment). translational and chemical modifications; For a variety of experimental reasons, this type of MS/MS spectrum frequently does not produce enough interpretable (6) score peptide sequences; and ions to sequence a peptide completely. An alternative method (7) create an XML output file capturing the best scoring for determining the sequence using the same information sequences and some statistical distributions relevant to was developed that involved comparing the experimentally the scoring process. observed fragment ions against those that would be expected The code for these objects was written in C++, using the To whom correspondence should be addressed. Standard Template Library. The code was written so that it 1466 Bioinformatics 20(9) © Oxford University Press 2004; all rights reserved. TANDEM: matching proteins with tandem mass spectra could cross-compile under Windows, Linux or OS X, with proteomics tasks (Craig and Beavis, 2003). The existence of only minor differences, handled by preprocessor commands. this type of open-source project will hopefully give research- The main difference between the platforms was the mech- ers a common platform for carrying out further exploration of anism for starting worker threads, with specific calls made the scientific issues currently outstanding in the application of necessary because of the differences between the Windows large-scale proteomics to biological systems. and POSIX threading libraries. ACKNOWLEDGEMENTS The XML chosen for both input and output was BIOML (Fenyö, 1999), with mass spectra and other histograms R.C.B. would like to thank D. Fenyö, H. Gaui and J. Wilkins represented using GAML (Duckworth, 2002; http://www. for many useful discussions. We would also like to thank the gaml.org/documentation.htm) as a namespace extension. The Manitoba Centre for Proteomics, the Canadian Institutes for current implementation’s API has 48 possible input paramet- Health Research, Beavis Informatics Ltd and the Institute ers. To simplify the process of entering so many parameters, for Biophysical Dynamics at the University of Chicago for a two-step input parsing system was used. The input file spe- contributing funding. cified on the command line can contain the name of a ‘default file’ that has values for all of the possible parameters. The input REFERENCES file parameters override the settings in the default file. By con- Aebersold,R. and Mann,M. (2003) Mass spectrometry-based proteo- structing a set of default files for common experimental situa- mics. Nature, 422, 198–207. tions, it is possible to create a very simple input file that only Craig,R. and Beavis,R.C. (2003) A method for reducing the time overrides the few parameters necessary for the experiment required to match protein sequences with tandem mass spectra. at hand. The sequence-containing FASTA files are specified Rapid Commun. Mass Spectrom., 17, 2310–2316. Duckworth,J. (2002) An XML Data Model for Analytical in the input file by a ‘taxon’ name that is defined in a ‘tax- Instruments. onomy’ XML file. More than one FASTA file may be specified Fenyö,D. (1999) The Biopolymer Markup Language. for any particular ‘taxon’ keyword. Examples for input para- Bioinformatics, 15, 339–340. meter, default parameter and taxonomy files were included Perkins,D.N., Pappin,D.J.C., Creasy,D.M. and Cottrell,J.S. (1999) with the distribution release of the software. An example of Probability-based protein identification by searching sequence how to use this software and a standard Common Gateway databases using mass spectrometry data. Electrophoresis, 20, Interface to create an HTTP interface were also included. 3551–3567. TANDEM has been tested thoroughly, both by the authors Rappsilber,J. and Mann,M. (2002) What does it mean to and several other groups in academia and industry. It has also identify a protein in proteomics? Trends Biochem. Sci., 27, been used to generate new methods for carrying out common 74–78. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

TANDEM: matching proteins with tandem mass spectra

Bioinformatics , Volume 20 (9): 2 – Feb 19, 2004

Loading next page...
 
/lp/oxford-university-press/tandem-matching-proteins-with-tandem-mass-spectra-Hwh0r17h5p

References (6)

Publisher
Oxford University Press
Copyright
Bioinformatics 20(9) © Oxford University Press 2004; all rights reserved.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bth092
pmid
14976030
Publisher site
See Article on Publisher Site

Abstract

Vol. 20 no. 9 2004, pages 1466–1467 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/bth092 TANDEM: matching proteins with tandem mass spectra 1 1,2,∗ Robertson Craig and Ronald C. Beavis Manitoba Centre for Proteomics, University of Manitoba, Winnipeg, MB, Canada R3T 2N2 and Institute for Biophysical Dynamics, University of Chicago, Chicago, IL 60637, USA Received on September 26, 2003; accepted and revised on December 2, 2003 Advance Access publication February 19, 2004 ABSTRACT for every known peptide sequence that could be generated Summary: Tandem mass spectra obtained from fragmenting from the known proteome of the organism under investig- peptide ions contain some peptide sequence specific informa- ation (Rappsilber and Mann, 2002). This approach has the tion, but often there is not enough information to sequence advantage that it only requires enough information to rank the original peptide completely. Several proprietary software peptides in a proteome. The necessity to be able to call even applications have been developed to attempt to match the short stretches of contiguous sequence is removed, making it spectra with a list of protein sequences that may contain possible to use spectra that cannot be manually interpreted in the sequence of the peptide. The application TANDEM was the usual manner. written to provide the proteomics research community with For any complete proteome, there are too many potential a set of components that can be used to test new methods peptide sequences to consider performing this type of correl- and algorithms for performing this type of sequence-to-data ation manually. Several software implementations have been matching. developed to carry out this process, but at the moment they Availability: The source code and binaries for this software are all proprietary and the source code is not available (e.g. are available at http://www.proteome.ca/opensource.html, for Perkins et al., 1999), which has limited the development of this Windows, Linux and Macintosh OSX. The source code is made type of technology. Our group has designed and implemented available under the Artistic License, from the authors. an open-source project that can be used for developing new Contact: rbeavis@proteome.ca algorithms to improve the results and efficiency of matching peptide sequences to MS/MS spectra. This project is called A significant branch of research in peptide mass spectro- ‘TANDEM’. metry in the last 30 years has focused on experimentally TANDEM was written to run from a command line, with determining peptide sequences using tandem mass spectro- an input XML file name as the only command line parameter. metry (MS/MS). The fundamental idea is as follows: isolate The code was created using a set of classes that perform the a particular intact peptide parent ion with one mass spectro- following tasks: meter; add electronic and vibrational energy to the isolated (1) read XML input parameter files; ion; and observe the resulting fragment ions with another mass spectrometer (Aebersold and Mann, 2003). If the res- (2) read protein sequences from FASTA files; ulting fragment ion spectrum contains signals that can be (3) read MS/MS spectra in common ASCII formats (DTA, correlated with bond-breaking reactions along the peptide PKL and Matrix Science); backbone, then the sequence of the peptide should be cal- (4) condition MS/MS spectra to remove noise and common culable because of the nearly unique masses associated with artifacts; each side chain (leucine and isoleucine are isobaric and cannot (5) process peptide sequences with cleavage reagents, post- be distinguished by this type of experiment). translational and chemical modifications; For a variety of experimental reasons, this type of MS/MS spectrum frequently does not produce enough interpretable (6) score peptide sequences; and ions to sequence a peptide completely. An alternative method (7) create an XML output file capturing the best scoring for determining the sequence using the same information sequences and some statistical distributions relevant to was developed that involved comparing the experimentally the scoring process. observed fragment ions against those that would be expected The code for these objects was written in C++, using the To whom correspondence should be addressed. Standard Template Library. The code was written so that it 1466 Bioinformatics 20(9) © Oxford University Press 2004; all rights reserved. TANDEM: matching proteins with tandem mass spectra could cross-compile under Windows, Linux or OS X, with proteomics tasks (Craig and Beavis, 2003). The existence of only minor differences, handled by preprocessor commands. this type of open-source project will hopefully give research- The main difference between the platforms was the mech- ers a common platform for carrying out further exploration of anism for starting worker threads, with specific calls made the scientific issues currently outstanding in the application of necessary because of the differences between the Windows large-scale proteomics to biological systems. and POSIX threading libraries. ACKNOWLEDGEMENTS The XML chosen for both input and output was BIOML (Fenyö, 1999), with mass spectra and other histograms R.C.B. would like to thank D. Fenyö, H. Gaui and J. Wilkins represented using GAML (Duckworth, 2002; http://www. for many useful discussions. We would also like to thank the gaml.org/documentation.htm) as a namespace extension. The Manitoba Centre for Proteomics, the Canadian Institutes for current implementation’s API has 48 possible input paramet- Health Research, Beavis Informatics Ltd and the Institute ers. To simplify the process of entering so many parameters, for Biophysical Dynamics at the University of Chicago for a two-step input parsing system was used. The input file spe- contributing funding. cified on the command line can contain the name of a ‘default file’ that has values for all of the possible parameters. The input REFERENCES file parameters override the settings in the default file. By con- Aebersold,R. and Mann,M. (2003) Mass spectrometry-based proteo- structing a set of default files for common experimental situa- mics. Nature, 422, 198–207. tions, it is possible to create a very simple input file that only Craig,R. and Beavis,R.C. (2003) A method for reducing the time overrides the few parameters necessary for the experiment required to match protein sequences with tandem mass spectra. at hand. The sequence-containing FASTA files are specified Rapid Commun. Mass Spectrom., 17, 2310–2316. Duckworth,J. (2002) An XML Data Model for Analytical in the input file by a ‘taxon’ name that is defined in a ‘tax- Instruments. onomy’ XML file. More than one FASTA file may be specified Fenyö,D. (1999) The Biopolymer Markup Language. for any particular ‘taxon’ keyword. Examples for input para- Bioinformatics, 15, 339–340. meter, default parameter and taxonomy files were included Perkins,D.N., Pappin,D.J.C., Creasy,D.M. and Cottrell,J.S. (1999) with the distribution release of the software. An example of Probability-based protein identification by searching sequence how to use this software and a standard Common Gateway databases using mass spectrometry data. Electrophoresis, 20, Interface to create an HTTP interface were also included. 3551–3567. TANDEM has been tested thoroughly, both by the authors Rappsilber,J. and Mann,M. (2002) What does it mean to and several other groups in academia and industry. It has also identify a protein in proteomics? Trends Biochem. Sci., 27, been used to generate new methods for carrying out common 74–78.

Journal

BioinformaticsOxford University Press

Published: Feb 19, 2004

There are no references for this article.