PANDA-view: an easy-to-use tool for statistical analysis and visualization of quantitative proteomics data

PANDA-view: an easy-to-use tool for statistical analysis and visualization of quantitative... Abstract Summary Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. Availability and implementation PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction In the new era of life-omics, quantitative proteomics is becoming wide-spread with the rapid developments of high-resolution mass spectrometers (MS) and superior experiment strategies (Schubert et al., 2017). Currently, there are lots of algorithms and tools for identification and quantification of -omics data. However, for most biological researchers who have few programming skills, the downstream analysis, such as the statistical analysis of differentially-expressed proteins (DEPs), remains a major challenge due to a lack of suitable and easy-to-use tools (Cappadona et al., 2012). The few existing tools usually cannot perform both downstream analysis and data visualization with comprehensive methods. For example, GProX (Rigbolt et al., 2011) and DanteR (Taverner et al., 2012) did not provide necessary statistical tests and data visualization methods; Perseus (Tyanova et al., 2016) and GiaPronto (Weiner et al., 2017) included few normalization methods. Here, to break the barrier between -omics data (especially the quantitative proteomics data) and the hidden biological/medical discoveries, we present an easy-to-use and light-weight tool, named PANDA-view, for statistical analysis and visualization of -omics data. PANDA-view can be compatible with other -omics tools by reading their results in comma-separated value (CSV) or tab-delimited text file format. It includes comprehensive methods for data normalization, imputation, DEP statistical test, unsupervised analysis and data visualization. Furthermore, it can provide a multi-level representation (from protein to MS spectrum) for the quantification results of PANDA (Chang et al., 2018). 2 Methods PANDA-view is designed to provide comprehensive methods for statistical analysis and visualization of -omics data, including the quantitative proteomics data (Fig. 1). See Supplementary Note for the detailed descriptions of every function in PANDA-view. Fig. 1. View largeDownload slide Illustrations of data analysis and visualization functions in PANDA-view. (a) Icons of the analysis and visualization functions in the menu. (b) Examples of data visualizations Fig. 1. View largeDownload slide Illustrations of data analysis and visualization functions in PANDA-view. (a) Icons of the analysis and visualization functions in the menu. (b) Examples of data visualizations 2.1 Data upload and pre-process The input data of PANDA-view can be any CSV or tab-delimited text file obtained from other tools. Once a file is chosen, all its column names will be shown in a wizard graphical user interface (GUI). Users can choose specific columns to load into PANDA-view. Further, when reading extremely large files, multi threads will be automatically started and the uploaded data can be displayed in the GUI in dynamic real time to avoid potential halt or crash. PANDA-view includes five kinds of operations for users to explore and pre-process their data: (i) Sort any column by numerical or character value using an efficient quick sort algorithm (with the median-of-three strategy). (ii) Search any column using user-defined keys. (iii) Filter any column with user-defined parameters. (iv) Logarithm. (v) Normalization, i.e. Z-score normalization, median normalization, maximum normalization, global normalization, interquartile range normalization, quantile normalization and variance stabilization normalization (Fig. 1a). Users can try different normalization methods and choose a best one for their data (Valikangas et al., 2018). 2.2 Missing value imputation It is known that missing value has a detrimental influence on the analysis of -omics data, such as DEP detection (Wang et al., 2017). Thus, missing values are usually imputed before future analysis. Based on R statistical environment (https://www.r-project.org/), two missing value imputation methods are implemented in PANDA-view: multiple imputation and K-nearest neighbors (KNN) imputation. 2.3 Statistical analysis As shown in Figure 1a, there are seven kinds of statistical tests in PANDA-view for DEP detection in different situations. (i) Parametric tests: t test (paired t test, independent t test and Welch’s t test) and ANOVA. (ii) Non-parametric tests: rank-sum test, permutation test and Fisher exact test. Specially, Fisher exact test is used to analyze discrete value, such as protein spectral counts. (iii) Significance analysis of microarrays (SAM) (Tusher et al., 2001). Although it was originally proposed for microarray data, SAM remains its popularity for -omics data due to its kinds of variants. (iv) Multiple hypothesis test. PANDA-view includes several prevalent methods to adjust the P-values, such as the Bonferroni method (Dunn, 1961), the Benjamini–Hochberg method (Benjamini and Hochberg, 1995) and the Benjamini–Yekutieli method (Benjamini and Yekutieli, 2001). 2.4 Unsupervised analysis of -omics data For -omics data, PANDA-view incorporates three popular unsupervised analysis methods, i.e. hierarchical clustering, K-means clustering and principal component analysis (PCA). For PCA, in addition to the scree plot, biplot and prediction plot in 2D, PANDA-view also provides a 3D scatter plot and a 3D biplot for visualization of the principal components. See Figure 1b and Supplementary Figures S1–S4 for details. 2.5 Data visualization Besides the various kinds of data analysis methods, PANDA-view also contains frequently-used visualization methods, including the 2D/3D scatter plot, the line chart, the histogram and the boxplot (Fig. 1b). All these figures can be clicked and dragged to zoom in or out and can be exported as images (JPG/PNG/BMP) or PDF files in user-defined size and resolution. Moreover, PANDA-view implements an interactive volcano plot for DEP detection. Any data column can be searched using user-defined keys and the retrieved results will be highlighted in the volcano plot (Supplementary Fig. S5). 2.6 Multi-level representation of proteomic quantification results PANDA-view has a special feature, i.e. displaying the quantitative analysis results of PANDA in multiple levels. It can automatically recognize PANDA’s outputs (protein/peptide/peptide ion quantification results). By right-clicking the corresponding index in the quantification result file, PANDA-view can track a protein to its quantified peptides and then to the corresponding peptide ions with the extracted ion chromatography (XIC) views. Thus, a multi-level representation of the proteomic quantification results (protein, peptide, peptide ion and XIC) can be performed in PANDA-view, which is expected to help users make an in-depth analysis of their data (Supplementary Fig. S6). Note, peptide ion indicates that the peptide with certain charge and post-translational modification identified by MS. 3 Conclusion In summary, PANDA-view is an easy-to-use and multifunctional tool for statistical analysis and visualization of -omics data, especially the quantitative proteomics data. It can handle both labeled and label-free quantitative data by offering comprehensive methods for data pre-process, DEP statistical test, as well as clustering analysis and PCA. Besides the commonly-used data visualization methods, PANDA-view implements a multi-level representation for the quantification results of PANDA, which is helpful for end users to explore and manually validate their data in detail. Funding This work was supported by the National Key Research and Development Program of China [2017YFA0505002 and 2017YFC0906602] and the National Natural Science Foundation of China [21605159]. Conflict of Interest: none declared. References Benjamini Y. , Hochberg Y. ( 1995 ) Controlling the false discovery rate: a practical and powerful approach to multiple testing . J. R. Stat. Soc. Ser. B (Methodological) , 57 , 289 – 300 . Benjamini Y. , Yekutieli D. ( 2001 ) The control of the false discovery rate in multiple testing under dependency . Ann. Statist ., 29 , 1165 – 1188 . Google Scholar Crossref Search ADS Cappadona S. et al. ( 2012 ) Current challenges in software solutions for mass spectrometry-based quantitative proteomics . Amino Acids , 43 , 1087 – 1108 . Google Scholar Crossref Search ADS PubMed Chang C. et al. ( 2018 ) PANDA: A comprehensive and flexible tool for proteomics data quantitative analysis, bioRxiv, doi: 10.1101/332957. Dunn O.J. ( 1961 ) Multiple comparisons among means . J. Am. Stat. Assoc ., 56 , 52 – 64 . Google Scholar Crossref Search ADS Rigbolt K.T. et al. ( 2011 ) GProX, a user-friendly platform for bioinformatics analysis and visualization of quantitative proteomics data . Mol. Cell Proteomics , 10 , O110 007450. Google Scholar Crossref Search ADS PubMed Schubert O.T. et al. ( 2017 ) Quantitative proteomics: challenges and opportunities in basic and applied research . Nat. Protoc ., 12 , 1289 – 1294 . Google Scholar Crossref Search ADS PubMed Taverner T. et al. ( 2012 ) DanteR: an extensible R-based tool for quantitative analysis of -omics data . Bioinformatics , 28 , 2404 – 2406 . Google Scholar Crossref Search ADS PubMed Tusher V.G. et al. ( 2001 ) Significance analysis of microarrays applied to the ionizing radiation response . Proc. Natl. Acad. Sci. USA , 98 , 5116 – 5121 . Google Scholar Crossref Search ADS Tyanova S. et al. ( 2016 ) The Perseus computational platform for comprehensive analysis of (prote)omics data . Nat. Methods , 13 , 731 – 740 . Google Scholar Crossref Search ADS PubMed Valikangas T. et al. ( 2018 ) A systematic evaluation of normalization methods in quantitative label-free proteomics . Brief Bioinform ., 19 , 1 – 11 . Google Scholar PubMed Wang J. et al. ( 2017 ) In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values . Sci. Rep ., 7 , 3367. Google Scholar Crossref Search ADS PubMed Weiner A.K. et al. ( 2017 ) GiaPronto: a one-click graph visualization software for proteomics datasets . Mol. Cell Proteomics , doi: 10.1074/mcp.TIR117.000438. © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

PANDA-view: an easy-to-use tool for statistical analysis and visualization of quantitative proteomics data

Loading next page...
 
/lp/ou_press/panda-view-an-easy-to-use-tool-for-statistical-analysis-and-W5bNQlSQ1Z
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty408
Publisher site
See Article on Publisher Site

Abstract

Abstract Summary Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. Availability and implementation PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction In the new era of life-omics, quantitative proteomics is becoming wide-spread with the rapid developments of high-resolution mass spectrometers (MS) and superior experiment strategies (Schubert et al., 2017). Currently, there are lots of algorithms and tools for identification and quantification of -omics data. However, for most biological researchers who have few programming skills, the downstream analysis, such as the statistical analysis of differentially-expressed proteins (DEPs), remains a major challenge due to a lack of suitable and easy-to-use tools (Cappadona et al., 2012). The few existing tools usually cannot perform both downstream analysis and data visualization with comprehensive methods. For example, GProX (Rigbolt et al., 2011) and DanteR (Taverner et al., 2012) did not provide necessary statistical tests and data visualization methods; Perseus (Tyanova et al., 2016) and GiaPronto (Weiner et al., 2017) included few normalization methods. Here, to break the barrier between -omics data (especially the quantitative proteomics data) and the hidden biological/medical discoveries, we present an easy-to-use and light-weight tool, named PANDA-view, for statistical analysis and visualization of -omics data. PANDA-view can be compatible with other -omics tools by reading their results in comma-separated value (CSV) or tab-delimited text file format. It includes comprehensive methods for data normalization, imputation, DEP statistical test, unsupervised analysis and data visualization. Furthermore, it can provide a multi-level representation (from protein to MS spectrum) for the quantification results of PANDA (Chang et al., 2018). 2 Methods PANDA-view is designed to provide comprehensive methods for statistical analysis and visualization of -omics data, including the quantitative proteomics data (Fig. 1). See Supplementary Note for the detailed descriptions of every function in PANDA-view. Fig. 1. View largeDownload slide Illustrations of data analysis and visualization functions in PANDA-view. (a) Icons of the analysis and visualization functions in the menu. (b) Examples of data visualizations Fig. 1. View largeDownload slide Illustrations of data analysis and visualization functions in PANDA-view. (a) Icons of the analysis and visualization functions in the menu. (b) Examples of data visualizations 2.1 Data upload and pre-process The input data of PANDA-view can be any CSV or tab-delimited text file obtained from other tools. Once a file is chosen, all its column names will be shown in a wizard graphical user interface (GUI). Users can choose specific columns to load into PANDA-view. Further, when reading extremely large files, multi threads will be automatically started and the uploaded data can be displayed in the GUI in dynamic real time to avoid potential halt or crash. PANDA-view includes five kinds of operations for users to explore and pre-process their data: (i) Sort any column by numerical or character value using an efficient quick sort algorithm (with the median-of-three strategy). (ii) Search any column using user-defined keys. (iii) Filter any column with user-defined parameters. (iv) Logarithm. (v) Normalization, i.e. Z-score normalization, median normalization, maximum normalization, global normalization, interquartile range normalization, quantile normalization and variance stabilization normalization (Fig. 1a). Users can try different normalization methods and choose a best one for their data (Valikangas et al., 2018). 2.2 Missing value imputation It is known that missing value has a detrimental influence on the analysis of -omics data, such as DEP detection (Wang et al., 2017). Thus, missing values are usually imputed before future analysis. Based on R statistical environment (https://www.r-project.org/), two missing value imputation methods are implemented in PANDA-view: multiple imputation and K-nearest neighbors (KNN) imputation. 2.3 Statistical analysis As shown in Figure 1a, there are seven kinds of statistical tests in PANDA-view for DEP detection in different situations. (i) Parametric tests: t test (paired t test, independent t test and Welch’s t test) and ANOVA. (ii) Non-parametric tests: rank-sum test, permutation test and Fisher exact test. Specially, Fisher exact test is used to analyze discrete value, such as protein spectral counts. (iii) Significance analysis of microarrays (SAM) (Tusher et al., 2001). Although it was originally proposed for microarray data, SAM remains its popularity for -omics data due to its kinds of variants. (iv) Multiple hypothesis test. PANDA-view includes several prevalent methods to adjust the P-values, such as the Bonferroni method (Dunn, 1961), the Benjamini–Hochberg method (Benjamini and Hochberg, 1995) and the Benjamini–Yekutieli method (Benjamini and Yekutieli, 2001). 2.4 Unsupervised analysis of -omics data For -omics data, PANDA-view incorporates three popular unsupervised analysis methods, i.e. hierarchical clustering, K-means clustering and principal component analysis (PCA). For PCA, in addition to the scree plot, biplot and prediction plot in 2D, PANDA-view also provides a 3D scatter plot and a 3D biplot for visualization of the principal components. See Figure 1b and Supplementary Figures S1–S4 for details. 2.5 Data visualization Besides the various kinds of data analysis methods, PANDA-view also contains frequently-used visualization methods, including the 2D/3D scatter plot, the line chart, the histogram and the boxplot (Fig. 1b). All these figures can be clicked and dragged to zoom in or out and can be exported as images (JPG/PNG/BMP) or PDF files in user-defined size and resolution. Moreover, PANDA-view implements an interactive volcano plot for DEP detection. Any data column can be searched using user-defined keys and the retrieved results will be highlighted in the volcano plot (Supplementary Fig. S5). 2.6 Multi-level representation of proteomic quantification results PANDA-view has a special feature, i.e. displaying the quantitative analysis results of PANDA in multiple levels. It can automatically recognize PANDA’s outputs (protein/peptide/peptide ion quantification results). By right-clicking the corresponding index in the quantification result file, PANDA-view can track a protein to its quantified peptides and then to the corresponding peptide ions with the extracted ion chromatography (XIC) views. Thus, a multi-level representation of the proteomic quantification results (protein, peptide, peptide ion and XIC) can be performed in PANDA-view, which is expected to help users make an in-depth analysis of their data (Supplementary Fig. S6). Note, peptide ion indicates that the peptide with certain charge and post-translational modification identified by MS. 3 Conclusion In summary, PANDA-view is an easy-to-use and multifunctional tool for statistical analysis and visualization of -omics data, especially the quantitative proteomics data. It can handle both labeled and label-free quantitative data by offering comprehensive methods for data pre-process, DEP statistical test, as well as clustering analysis and PCA. Besides the commonly-used data visualization methods, PANDA-view implements a multi-level representation for the quantification results of PANDA, which is helpful for end users to explore and manually validate their data in detail. Funding This work was supported by the National Key Research and Development Program of China [2017YFA0505002 and 2017YFC0906602] and the National Natural Science Foundation of China [21605159]. Conflict of Interest: none declared. References Benjamini Y. , Hochberg Y. ( 1995 ) Controlling the false discovery rate: a practical and powerful approach to multiple testing . J. R. Stat. Soc. Ser. B (Methodological) , 57 , 289 – 300 . Benjamini Y. , Yekutieli D. ( 2001 ) The control of the false discovery rate in multiple testing under dependency . Ann. Statist ., 29 , 1165 – 1188 . Google Scholar Crossref Search ADS Cappadona S. et al. ( 2012 ) Current challenges in software solutions for mass spectrometry-based quantitative proteomics . Amino Acids , 43 , 1087 – 1108 . Google Scholar Crossref Search ADS PubMed Chang C. et al. ( 2018 ) PANDA: A comprehensive and flexible tool for proteomics data quantitative analysis, bioRxiv, doi: 10.1101/332957. Dunn O.J. ( 1961 ) Multiple comparisons among means . J. Am. Stat. Assoc ., 56 , 52 – 64 . Google Scholar Crossref Search ADS Rigbolt K.T. et al. ( 2011 ) GProX, a user-friendly platform for bioinformatics analysis and visualization of quantitative proteomics data . Mol. Cell Proteomics , 10 , O110 007450. Google Scholar Crossref Search ADS PubMed Schubert O.T. et al. ( 2017 ) Quantitative proteomics: challenges and opportunities in basic and applied research . Nat. Protoc ., 12 , 1289 – 1294 . Google Scholar Crossref Search ADS PubMed Taverner T. et al. ( 2012 ) DanteR: an extensible R-based tool for quantitative analysis of -omics data . Bioinformatics , 28 , 2404 – 2406 . Google Scholar Crossref Search ADS PubMed Tusher V.G. et al. ( 2001 ) Significance analysis of microarrays applied to the ionizing radiation response . Proc. Natl. Acad. Sci. USA , 98 , 5116 – 5121 . Google Scholar Crossref Search ADS Tyanova S. et al. ( 2016 ) The Perseus computational platform for comprehensive analysis of (prote)omics data . Nat. Methods , 13 , 731 – 740 . Google Scholar Crossref Search ADS PubMed Valikangas T. et al. ( 2018 ) A systematic evaluation of normalization methods in quantitative label-free proteomics . Brief Bioinform ., 19 , 1 – 11 . Google Scholar PubMed Wang J. et al. ( 2017 ) In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values . Sci. Rep ., 7 , 3367. Google Scholar Crossref Search ADS PubMed Weiner A.K. et al. ( 2017 ) GiaPronto: a one-click graph visualization software for proteomics datasets . Mol. Cell Proteomics , doi: 10.1074/mcp.TIR117.000438. © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Journal

BioinformaticsOxford University Press

Published: Oct 15, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off