ggCyto: next generation open-source visualization software for cytometry

ggCyto: next generation open-source visualization software for cytometry Abstract Motivation Open source software for computational cytometry has gained in popularity over the past few years. Efforts such as FlowCAP, the Lyoplate and Euroflow projects have highlighted the importance of efforts to standardize both experimental and computational aspects of cytometry data analysis. The R/BioConductor platform hosts the largest collection of open source cytometry software covering all aspects of data analysis and providing infrastructure to represent and analyze cytometry data with all relevant experimental, gating and cell population annotations enabling fully reproducible data analysis. Data visualization frameworks to support this infrastructure have lagged behind. Results ggCyto is a new open-source BioConductor software package for cytometry data visualization built on ggplot2 that enables ggplot-like functionality with the core BioConductor flow cytometry data structures. Amongst its features are the ability to transform data and axes on-the-fly using cytometry-specific transformations, plot faceting by experimental meta-data variables and partial matching of channel, marker and cell populations names to the contents of the BioConductor cytometry data structures. We demonstrate the salient features of the package using publicly available cytometry data with complete reproducible examples in a Supplementary Material. Availability and implementation https://bioconductor.org/packages/devel/bioc/html/ggcyto.html Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Cytometry (FCM) is the primary assay for immune monitoring in clinical and research applications (Maecker et al., 2012). Pipelines must handle preprocessing, quality control, analysis (i.e. cell clustering or manual partitioning into homogeneous groups) (O(oups)eet al., 2013; Saeys et al., 2016) and visualization. Proprietary platforms, including FlowJo (Ashland, OR), WinList, FCSExpress and DIVA are the de-facto standards for end-to-end FCM data analysis. Other programming frameworks like Matlab (Matlab 7.0.4, Natick, MA: MathWorks) and Mathematica (Mathematica 9.0, Champaign, IL: Wolfram Research) provide functionality for data import and exploration [indeed, SPADE (Qiu et al., 2011) was initially developed for MATLAB], but lack the general abstraction of cytometry-specific data structures helpful for data analysis. Open-source projects like R/BioConductor (R/BioC) (Gentleman et al., 2004; Ihaka and Gentleman, 1996) and Python provide FCM functionality through user-contributed packages (Frelinger et al., 2012). Currently 47 open source software packages in BioConductor are tagged for ‘FlowCytometry’ (http://bioconductor.org/packages/release/BiocViews.html) but only flowViz (Sarkar et al., 2008) is visualization-centric and doesnli support the core BioConductor data structures used to store analyzed, gated and annotated, single-cell FCM data (see Supplementary Material). Other packages focus on different aspects of automated analysis. We introduce ggcyto, a BioConductor package for building reproducible FCM visualizations programmatically. It is built on ggplot2 (Wickham, 2009) and supports the core BioConductor cytometry data structures making it compatible with any package using those structures (see Supplementary Material). 2 ggCyto 2.1 Basic principles To construct a plot with ggcyto users specify a data source (Fig. 1), and, analogous to ggplot2, they map plot elements to variables in the data source. With ggcyto however, users map plot axes to flow parameters (e.g. channels or markers), specify the cell population to plot, specify cytometry-specific axis transformations and potentially specify gates (e.g. elements defining cell populations) to add to the plot. These elements are built up via layers and are referred by name, mapping directly to quantities (i.e. data) in the data source. For ease of use, ggcyto supports partial string matches (Fig. 1 and Supplementary Material), particularly useful for identifying complex channel names or cell populations. Fig. 1. View largeDownload slide ggcyto is compatible with ungated and gated data sources represented by the core BioConductor FCM data structures (flowSet/flowFrame and GatingSet/GatingHierarchy). Plots can be constructed using the (1) autoplot or (2) ggcyto APIs, giving users more control. Custom layers control cytometry-specific plot elements including 3) data transformation Fig. 1. View largeDownload slide ggcyto is compatible with ungated and gated data sources represented by the core BioConductor FCM data structures (flowSet/flowFrame and GatingSet/GatingHierarchy). Plots can be constructed using the (1) autoplot or (2) ggcyto APIs, giving users more control. Custom layers control cytometry-specific plot elements including 3) data transformation 2.2 Availability ggcyto is open-source and available on GitHub and BioConductor (https://github.com/RGLab/ggcyto/releases/tag/v1.9.5 and https://bioconductor.org/packages/devel/bioc/html/ggcyto.html). 2.3 Quick plotting with the autoplot API The autoplot API is a quick way to build plots. It makes most of the plot decisions for the user based on domain knowledge and information encoded in the data source (Fig. 1 and Supplementary Material). For example, passing a GatingHierarchy and a vector of cell population names (defined by gated cell populations in the GatingHierarchy) creates a faceted array (one panel for each sample) of two-dimensional density plots (using hexagonal binning) of the parent cell population projected onto the dimensions of any gates defining those cell subsets (Fig. 1, Supplementary Material). The ‘CD3’ and ‘CD19’ populations shown in Figure 1 are named cell populations defined by gates in the GatingHierarchy. They should not to be confused with markers of the same name. Two-dimensional densities are chosen by autoplot because the gates defining the CD3 and CD19 cell populations are two dimensional. In cases where gates defining a cell population are one dimensional, a one-dimensional density would be plotted. In this sense, autoplot is context aware, selecting geoms appropriate for visualizing the desired cell population. Analogously, autoplot can be used to create plots from flowSet and flowFrame objects (for ungated data) or GatingHierarchy and GatingSet objects (for gated data, Fig. 1 and Supplementary Material). In the case of ungated data, the user specifies the channels/markers to visualize, rather than the cell population (since the latter is not defined). 2.4 Customizing plots with cytometry-specific layers The ggcyto() API provides greater flexibility and customization than autoplot (Fig. 1). When using ggcyto, the layers and defaults selected by autoplot are decisions left to the user. Leveraging ggcytogs cytometry-specific layers and geoms, the user builds the plot (Fig. 1 and Supplementary Material) to include the gates, overlays (e.g. backgating), data or axis transformations, cell subpopulations and cell subpopulation statistics of interest, and specifies the faceting of plots by metadata annotations (see Supplementary Material). The ggcyto API can be particularly useful to project cell populations onto other markers (i.e. not necessarily those on which the populations are defined). The support for data transformations in ggcyto is 2-fold: ggcyto can transform the underlying data (Fig. 1), or it can transform the axes using the transformation stored in the data source (Fig. 1). These approaches are demonstrated in the Supplementary Material. 3 Examples The functionality of ggcyto is demonstrated using the Lyoplate dataset from FlowCAP 4 (Finak et al., 2016) available in the flowWorkspaceData R/BioConductor package and on the ImmuneSpace portal (Brusic et al., 2014) (see the Supplementary Material for link to this data on ImmuneSpace), as well as the graft versus host disease (GvHD) data available in the flowCore R/BioConductor package. Reproducible examples with R code are in the Supplementary Material and available at http://rglab.org/ggcyto/. In future, additional cytometry data may be available via the more modern AnnotationHub or ExperimentHub resources (Morgan et al., 2016; Pasolli et al., 2017). 4 Conclusion The ggcyto package provides a powerful and unified visualization interface to complex, ungated or gated, annotated cytometry data structures and provides a key component of a reproducible research workflow. Specifically, the package allows for easy visualization of specific cytometry cell populations and gates, on the fly data and axis transformation, back-gating visualization and easy faceting by study metadata in order to explore variability in an experiment. User-friendliness is made possible through fuzzy name matching, lazy data loading and context-sensitive behavior that aims to capture ‘what the user means to do’ most frequently. Areas for future developments are highlighted in the Supplementary Material. Acknowledgements The authors wish to acknowledge the contributions of the computational flow community for testing and feedback on this software package. Funding This work was supported by an NIGMS grant [R01 GM118417-01A1 to G.F.], and a grant from the Bill and Melinda Gates Foundation, [OPP1032317 to R.G.]. Conflict of Interest: none declared. References Brusic V. , et al. ( 2014 ) Computational resources for high-dimensional immune analysis from the human immunology project consortium . Nat. Biotechnol. , 32 , 146 – 148 . Google Scholar Crossref Search ADS PubMed Finak G. , et al. ( 2016 ) Standardizing flow cytometry immunophenotyping analysis from the human ImmunoPhenotyping consortium . Sci. Rep. , 6 , 20686 . Google Scholar Crossref Search ADS PubMed Frelinger J. , et al. ( 2012 ) Fcm-A python library for flow cytometry . In: Proceedings of the 11th Python in Science Conference , SCIPY 2012, Austin, TX. Gentleman R.C. , et al. ( 2004 ) Bioconductor: open software development for computational biology and bioinformatics . Genome Biol. , 5 , R80 . Google Scholar Crossref Search ADS PubMed Ihaka R. , Gentleman R. ( 1996 ) R: a language for data analysis and graphics . J. Comput. Graph. Stat. , 5 , 299 – 314 . Maecker H.T. , et al. ( 2012 ) Standardizing immunophenotyping for the human immunology project . Nat. Rev. Immunol. , 12 , 191 – 200 . Google Scholar Crossref Search ADS PubMed Morgan M. , et al. ( 2016 ) AnnotationHub: client to access AnnotationHub resources. R Package Version, 2 . O’Neill K. , et al. ( 2013 ) Flow cytometry bioinformatics . PLoS Comput. Biol. , 9 , e1003365 . Google Scholar Crossref Search ADS PubMed Pasolli E. , et al. ( 2017 ) Accessible, curated metagenomic data through ExperimentHub . Nat. Methods , 14 , 1023 – 1024 . Google Scholar Crossref Search ADS PubMed Qiu P. , et al. ( 2011 ) Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE . Nat. Biotechnol. , 29 , 886 – 891 . Google Scholar Crossref Search ADS PubMed Saeys Y. , et al. ( 2016 ) Computational flow cytometry: helping to make sense of high-dimensional immunology data . Nat. Rev. Immunol. , 16 , 449 – 462 . Google Scholar Crossref Search ADS PubMed Sarkar D. , et al. ( 2008 ) Using flowviz to visualize flow cytometry data . Bioinformatics , 24 , 878 – 879 . Google Scholar Crossref Search ADS PubMed Wickham H. ( 2009 ). ggplot2: Elegant Graphics for Data Analysis . Use R! Springer Science & Business Media , Heidelberg, Berlin, Germany . © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

ggCyto: next generation open-source visualization software for cytometry

Loading next page...
 
/lp/ou_press/ggcyto-next-generation-open-source-visualization-software-for-GV8Gkd8zFp
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty441
Publisher site
See Article on Publisher Site

Abstract

Abstract Motivation Open source software for computational cytometry has gained in popularity over the past few years. Efforts such as FlowCAP, the Lyoplate and Euroflow projects have highlighted the importance of efforts to standardize both experimental and computational aspects of cytometry data analysis. The R/BioConductor platform hosts the largest collection of open source cytometry software covering all aspects of data analysis and providing infrastructure to represent and analyze cytometry data with all relevant experimental, gating and cell population annotations enabling fully reproducible data analysis. Data visualization frameworks to support this infrastructure have lagged behind. Results ggCyto is a new open-source BioConductor software package for cytometry data visualization built on ggplot2 that enables ggplot-like functionality with the core BioConductor flow cytometry data structures. Amongst its features are the ability to transform data and axes on-the-fly using cytometry-specific transformations, plot faceting by experimental meta-data variables and partial matching of channel, marker and cell populations names to the contents of the BioConductor cytometry data structures. We demonstrate the salient features of the package using publicly available cytometry data with complete reproducible examples in a Supplementary Material. Availability and implementation https://bioconductor.org/packages/devel/bioc/html/ggcyto.html Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Cytometry (FCM) is the primary assay for immune monitoring in clinical and research applications (Maecker et al., 2012). Pipelines must handle preprocessing, quality control, analysis (i.e. cell clustering or manual partitioning into homogeneous groups) (O(oups)eet al., 2013; Saeys et al., 2016) and visualization. Proprietary platforms, including FlowJo (Ashland, OR), WinList, FCSExpress and DIVA are the de-facto standards for end-to-end FCM data analysis. Other programming frameworks like Matlab (Matlab 7.0.4, Natick, MA: MathWorks) and Mathematica (Mathematica 9.0, Champaign, IL: Wolfram Research) provide functionality for data import and exploration [indeed, SPADE (Qiu et al., 2011) was initially developed for MATLAB], but lack the general abstraction of cytometry-specific data structures helpful for data analysis. Open-source projects like R/BioConductor (R/BioC) (Gentleman et al., 2004; Ihaka and Gentleman, 1996) and Python provide FCM functionality through user-contributed packages (Frelinger et al., 2012). Currently 47 open source software packages in BioConductor are tagged for ‘FlowCytometry’ (http://bioconductor.org/packages/release/BiocViews.html) but only flowViz (Sarkar et al., 2008) is visualization-centric and doesnli support the core BioConductor data structures used to store analyzed, gated and annotated, single-cell FCM data (see Supplementary Material). Other packages focus on different aspects of automated analysis. We introduce ggcyto, a BioConductor package for building reproducible FCM visualizations programmatically. It is built on ggplot2 (Wickham, 2009) and supports the core BioConductor cytometry data structures making it compatible with any package using those structures (see Supplementary Material). 2 ggCyto 2.1 Basic principles To construct a plot with ggcyto users specify a data source (Fig. 1), and, analogous to ggplot2, they map plot elements to variables in the data source. With ggcyto however, users map plot axes to flow parameters (e.g. channels or markers), specify the cell population to plot, specify cytometry-specific axis transformations and potentially specify gates (e.g. elements defining cell populations) to add to the plot. These elements are built up via layers and are referred by name, mapping directly to quantities (i.e. data) in the data source. For ease of use, ggcyto supports partial string matches (Fig. 1 and Supplementary Material), particularly useful for identifying complex channel names or cell populations. Fig. 1. View largeDownload slide ggcyto is compatible with ungated and gated data sources represented by the core BioConductor FCM data structures (flowSet/flowFrame and GatingSet/GatingHierarchy). Plots can be constructed using the (1) autoplot or (2) ggcyto APIs, giving users more control. Custom layers control cytometry-specific plot elements including 3) data transformation Fig. 1. View largeDownload slide ggcyto is compatible with ungated and gated data sources represented by the core BioConductor FCM data structures (flowSet/flowFrame and GatingSet/GatingHierarchy). Plots can be constructed using the (1) autoplot or (2) ggcyto APIs, giving users more control. Custom layers control cytometry-specific plot elements including 3) data transformation 2.2 Availability ggcyto is open-source and available on GitHub and BioConductor (https://github.com/RGLab/ggcyto/releases/tag/v1.9.5 and https://bioconductor.org/packages/devel/bioc/html/ggcyto.html). 2.3 Quick plotting with the autoplot API The autoplot API is a quick way to build plots. It makes most of the plot decisions for the user based on domain knowledge and information encoded in the data source (Fig. 1 and Supplementary Material). For example, passing a GatingHierarchy and a vector of cell population names (defined by gated cell populations in the GatingHierarchy) creates a faceted array (one panel for each sample) of two-dimensional density plots (using hexagonal binning) of the parent cell population projected onto the dimensions of any gates defining those cell subsets (Fig. 1, Supplementary Material). The ‘CD3’ and ‘CD19’ populations shown in Figure 1 are named cell populations defined by gates in the GatingHierarchy. They should not to be confused with markers of the same name. Two-dimensional densities are chosen by autoplot because the gates defining the CD3 and CD19 cell populations are two dimensional. In cases where gates defining a cell population are one dimensional, a one-dimensional density would be plotted. In this sense, autoplot is context aware, selecting geoms appropriate for visualizing the desired cell population. Analogously, autoplot can be used to create plots from flowSet and flowFrame objects (for ungated data) or GatingHierarchy and GatingSet objects (for gated data, Fig. 1 and Supplementary Material). In the case of ungated data, the user specifies the channels/markers to visualize, rather than the cell population (since the latter is not defined). 2.4 Customizing plots with cytometry-specific layers The ggcyto() API provides greater flexibility and customization than autoplot (Fig. 1). When using ggcyto, the layers and defaults selected by autoplot are decisions left to the user. Leveraging ggcytogs cytometry-specific layers and geoms, the user builds the plot (Fig. 1 and Supplementary Material) to include the gates, overlays (e.g. backgating), data or axis transformations, cell subpopulations and cell subpopulation statistics of interest, and specifies the faceting of plots by metadata annotations (see Supplementary Material). The ggcyto API can be particularly useful to project cell populations onto other markers (i.e. not necessarily those on which the populations are defined). The support for data transformations in ggcyto is 2-fold: ggcyto can transform the underlying data (Fig. 1), or it can transform the axes using the transformation stored in the data source (Fig. 1). These approaches are demonstrated in the Supplementary Material. 3 Examples The functionality of ggcyto is demonstrated using the Lyoplate dataset from FlowCAP 4 (Finak et al., 2016) available in the flowWorkspaceData R/BioConductor package and on the ImmuneSpace portal (Brusic et al., 2014) (see the Supplementary Material for link to this data on ImmuneSpace), as well as the graft versus host disease (GvHD) data available in the flowCore R/BioConductor package. Reproducible examples with R code are in the Supplementary Material and available at http://rglab.org/ggcyto/. In future, additional cytometry data may be available via the more modern AnnotationHub or ExperimentHub resources (Morgan et al., 2016; Pasolli et al., 2017). 4 Conclusion The ggcyto package provides a powerful and unified visualization interface to complex, ungated or gated, annotated cytometry data structures and provides a key component of a reproducible research workflow. Specifically, the package allows for easy visualization of specific cytometry cell populations and gates, on the fly data and axis transformation, back-gating visualization and easy faceting by study metadata in order to explore variability in an experiment. User-friendliness is made possible through fuzzy name matching, lazy data loading and context-sensitive behavior that aims to capture ‘what the user means to do’ most frequently. Areas for future developments are highlighted in the Supplementary Material. Acknowledgements The authors wish to acknowledge the contributions of the computational flow community for testing and feedback on this software package. Funding This work was supported by an NIGMS grant [R01 GM118417-01A1 to G.F.], and a grant from the Bill and Melinda Gates Foundation, [OPP1032317 to R.G.]. Conflict of Interest: none declared. References Brusic V. , et al. ( 2014 ) Computational resources for high-dimensional immune analysis from the human immunology project consortium . Nat. Biotechnol. , 32 , 146 – 148 . Google Scholar Crossref Search ADS PubMed Finak G. , et al. ( 2016 ) Standardizing flow cytometry immunophenotyping analysis from the human ImmunoPhenotyping consortium . Sci. Rep. , 6 , 20686 . Google Scholar Crossref Search ADS PubMed Frelinger J. , et al. ( 2012 ) Fcm-A python library for flow cytometry . In: Proceedings of the 11th Python in Science Conference , SCIPY 2012, Austin, TX. Gentleman R.C. , et al. ( 2004 ) Bioconductor: open software development for computational biology and bioinformatics . Genome Biol. , 5 , R80 . Google Scholar Crossref Search ADS PubMed Ihaka R. , Gentleman R. ( 1996 ) R: a language for data analysis and graphics . J. Comput. Graph. Stat. , 5 , 299 – 314 . Maecker H.T. , et al. ( 2012 ) Standardizing immunophenotyping for the human immunology project . Nat. Rev. Immunol. , 12 , 191 – 200 . Google Scholar Crossref Search ADS PubMed Morgan M. , et al. ( 2016 ) AnnotationHub: client to access AnnotationHub resources. R Package Version, 2 . O’Neill K. , et al. ( 2013 ) Flow cytometry bioinformatics . PLoS Comput. Biol. , 9 , e1003365 . Google Scholar Crossref Search ADS PubMed Pasolli E. , et al. ( 2017 ) Accessible, curated metagenomic data through ExperimentHub . Nat. Methods , 14 , 1023 – 1024 . Google Scholar Crossref Search ADS PubMed Qiu P. , et al. ( 2011 ) Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE . Nat. Biotechnol. , 29 , 886 – 891 . Google Scholar Crossref Search ADS PubMed Saeys Y. , et al. ( 2016 ) Computational flow cytometry: helping to make sense of high-dimensional immunology data . Nat. Rev. Immunol. , 16 , 449 – 462 . Google Scholar Crossref Search ADS PubMed Sarkar D. , et al. ( 2008 ) Using flowviz to visualize flow cytometry data . Bioinformatics , 24 , 878 – 879 . Google Scholar Crossref Search ADS PubMed Wickham H. ( 2009 ). ggplot2: Elegant Graphics for Data Analysis . Use R! Springer Science & Business Media , Heidelberg, Berlin, Germany . © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Journal

BioinformaticsOxford University Press

Published: Nov 15, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off