Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)

Belinda Hernández; Andrew Parnell; Stephen R. Pennington

doi:10.1002/pmic.201300377

Loading next page...

References (23)

Michael Wright, David Han, R. Aebersold (2005)
Mass Spectrometry-based Expression Profiling of Clinical Prostate Cancer
Molecular & Cellular Proteomics, 4
S. Surinova, R. Schiess, Ruth Hüttenhain, Ferdinando Cerciello, B. Wollscheid, R. Aebersold (2011)
On the development of plasma protein biomarkers.
Journal of proteome research, 10 1
Feng Chen, Jihua Xue, Linfu Zhou, Shanshan Wu, Z. Chen (2011)
Identification of serum biomarkers of hepatocarcinoma through liquid chromatography/mass spectrometry-based metabonomic method
Analytical and Bioanalytical Chemistry, 401
K. Podwojski, M. Eisenacher, M. Kohl, Michael Turewicz, H. Meyer, J. Rahnenführer, C. Stephan (2010)
Peek a peak: a glance at statistics for quantitative label-free proteomics
Expert Review of Proteomics, 7
L. Breiman (2001)
Random Forests
Machine Learning, 45
Diamandis (2012)
The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?
BMC Med., 10
Carolin Strobl, A. Boulesteix, Thomas Kneib, Thomas Augustin, Achim Zeileis (2008)
Conditional variable importance for random forests
BMC Bioinformatics, 9
Bin Chen, R. Sheridan, V. Hornak, J. Voigt (2012)
Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions
Journal of chemical information and modeling, 52 3
Baolin Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, Hongyu Zhao (2003)
Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data
Bioinformatics, 19 13
(2002)
Selection bias in gene extraction on the basis of microarray gene-expression data
D. Sampson, T. Parker, Z. Upton, C. Hurst (2011)
A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
PLoS ONE, 6
Monika Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A. Boulesteix (2010)
Over-optimism in bioinformatics: an illustration
Bioinformatics, 26 16
X. Robin, N. Turck, A. Hainard, F. Lisacek, Jean-Charles Sanchez, Markus Müller (2009)
Bioinformatics for protein biomarker panel classification: what is needed to bring biomarker panels into in vitro diagnostics?
Expert Review of Proteomics, 6
M. Hilario, Alexandros Kalousis, C. Pellegrini, Markus Müller (2006)
Processing and classification of protein mass spectra.
Mass spectrometry reviews, 25 3
D. Böhm, K. Keller, Nelli Wehrwein, A. Lebrecht, Marcus Schmidt, H. Kölbl, F. Grus (2011)
Serum proteome profiling of primary breast cancer indicates a specific biomarker profile.
Oncology reports, 26 5
L. Long, Ru Li, Yongzhe Li, Chao-jun Hu, Zhanguo Li (2011)
Pattern-based diagnosis and screening of differentially expressed serum proteins for rheumatoid arthritis by proteomic fingerprinting
Rheumatology International, 31
S. Oon, S. Pennington, J. Fitzpatrick, R. Watson (2011)
Biomarker research in prostate cancer—towards utility, not futility
Nature Reviews Urology, 8
J. Kaiser (2012)
Clinical medicine. Biomarker tests need closer scrutiny, IOM concludes.
Science, 335 6076
R. Caruana, Nikolaos Karampatziakis, Ainur Yessenalina (2008)
An empirical evaluation of supervised learning in high dimensions
L. Lausser, Christoph Müssel, Markus Maucher, H. Kestler (2013)
Measuring and visualizing the stability of biomarker selection techniques
Computational Statistics, 28
N. Rifai, Michael Gillette, S. Carr (2006)
Protein biomarker discovery and validation: the long and uncertain path to clinical utility
Nature Biotechnology, 24
R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Koong, Q. Le (2004)
Sample classification from protein mass spectrometry, by 'peak probability contrasts'
Bioinformatics, 20 17
Jianjie Ma, Yun-Bo Shi
C O M M E N T a R Y Open Access

Publisher: Wiley
Copyright: "© 2014 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim"
ISSN: 1615-9853
eISSN: 1615-9861
DOI: 10.1002/pmic.201300377
pmid: 24737731
Publisher site: See Article on Publisher Site

Abstract

Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC‐MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC‐MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.

Journal

Proteomics – Wiley

Published: Jul 1, 2014

Keywords: ; ; ; ; ;

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)

Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)

Why have so few proteomic biomarkers “survived” validation? (Sample size and independent validation considerations)

References (23)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies