ImmunomeBrowser: a tool to aggregate and visualize complex and heterogeneous epitopes in reference proteins

ImmunomeBrowser: a tool to aggregate and visualize complex and heterogeneous epitopes in... Abstract Motivation Datasets that are derived from different studies (e.g. MHC ligand elution, MHC binding, B/T cell epitope screening etc.) often vary in terms of experimental approaches, sizes of peptides tested, including partial and or nested overlapping peptides and in the number of donors tested. Results We present a customized application of the Immune Epitope Database’s ImmunomeBrowser tool, which can be used to effectively aggregate and visualize heterogeneous immunological data. User provided peptide sets and associated response data is mapped to a user-provided protein reference sequence. The output consists of tables and figures representing the aggregated data represented by a Response Frequency score and associated estimated confidence interval. This allows the user to visualizing regions associated with dominant responses and their boundaries. The results are presented both as a user interactive javascript based web interface and a tabular format in a selected reference sequence. Availability and implementation The ‘ImmunomeBrowser’ has been a longstanding feature of the IEDB (http://www.iedb.org). The present application extends the use of this tool to work with user-provided datasets, rather than the output of IEDB queries. This new server version of the ImmunomeBrowser is freely accessible at http://tools.iedb.org/immunomebrowser/. 1 Introduction Datasets that are derived from different studies (e.g. MHC ligand elution, MHC binding, B/T cell epitope screening etc.; Li Pira et al., 2010) will often vary in terms of the experimental approaches used, the size of peptides that were tested and in the number of donors tested. To enable simplified visualization of the aggregated data present in the Immune Epitope Database (IEDB), we implemented the application called the ‘ImmunomeBrowser’ in the IEDB (Vita et al., 2015). This tool aggregates all data relevant to the user query and allows one to visualize the known immune response to a specific antigen, as well as illustrating knowledge gaps in a reference protein. It provides the immune reactivity in terms of response frequency (RF) and the number of subjects tested/responded and/or number of independent assays performed along the length of reference protein. The tool was originally implemented in the results page of the database section of the IEDB. To further extend the usability to predicted epitopes and propriety epitopes or non-IEDB data, the online tool described herein was developed. The utility of the approach was demonstrated by Kim et al. who performed a meta-analysis of Hepatitis C virus (HCV) data available in the IEDB, to present a bigger picture of the immune reactivity and knowledge gaps in the reference protein sequences of the virus (Kim et al., 2012). Currently, the Immunomebrowser can only be used with data derived from IEDB queries, and not with user datasets. To overcome this problem, we implemented the ImmunomeBrowser as a stand-alone tool to allow users to analyze and visualize immunodominant regions within their own dataset. 2 Materials and methods 2.1 Data input The user provides peptide sequences, the response data for each, the protein sequence/s of interest and their desired sequence identity threshold in specified formats. The peptide response can be either pasted or uploaded as a file in whitespace separated format with three columns, corresponding to peptide sequence, number of subjects tested and/or number of assays performed, and the number of subjects responded and/or number of assays resulting in a positive response. In cases, where the number of subjects tested or responded or assays performed are not provided, the program will automatically fill in a value of ‘1’ for the number of subjects tested or assays performed, as well as for the number of subjects responding or positive assays. Protein sequences/s must be provided in ‘Fasta’ format and sequence identity is selected from a drop-down menu that varies from 10–100% with an interval of 10%. 2.2 Mapping of epitopes Each peptide is mapped to a user provided reference protein sequence according to the provided identity threshold. The degree of identity is calculated based on the alignment of the peptide within reference sequence. Only peptides with sequence identity above the threshold are selected for further calculations. 2.3 RF and confidence interval calculations The RF for a given peptide and for each source protein position is calculated as the total number of subjects that responded to that particular peptide and/or independent assay performed for which a positive response was noted (R) divided the total number of subjects tested and/or number of assays performed (N). RF=RN (1) A Confidence interval (CI) is calculated to weight the RF reliability as a function of the number of subjects tested. CI is calculated using the binomial cumulative distribution function and Wilson score. For large sample size (N>=50), lower and upper bound were calculated using following equation. CI=(((R/N)+1.96*1.96/(2*N)±1.96*sqrt(((R/N)*(1-(R/N))+1.96*1.96/(4*N))/N))/(1+1.96*1.96/N)) (2) For small small sample sizes (N < 50), lower and upper bounds are calculated using binomial cumulative distribution function. 2.4 Aggregation of RF data from different overlapping peptides Aggregation of data is required to identify the most frequently recognized epitopes, which can reflect the overall frequency of recognition of peptide sequence containing a given residue. This approach is useful to identify the RF at each position in the reference sequence. To calculate the aggregated RF data, the number of subjects tested and/or assay performed and number of subjects responded and/or number of assays resulted a positive response were summed up for each mapped position in a given source protein. The CI of RF is calculated using the equations described above. 2.5 Result display The results are presented in two steps, where the first step provides the summary of epitopes and assays mapped back to the reference protein sequence (Fig. 1). Fig. 1. View largeDownload slide Screenshots for the example output of the customized application of ‘ImmunomeBrowser’. (A) Tabular format listing all the different epitopes mapped to the given reference protein sequence. (B) Area plot for upper and lower bound CI for RF. The line plot shows the number of positive and negative assays or number of responder and not-responder subjects along the positions in reference protein. Hovering the mouse over any position in the reference protein in any of these plots will display the lower and upper bounds of the RF and number of assays/subjects count found as positive and negative (as shown in red rectangle) Fig. 1. View largeDownload slide Screenshots for the example output of the customized application of ‘ImmunomeBrowser’. (A) Tabular format listing all the different epitopes mapped to the given reference protein sequence. (B) Area plot for upper and lower bound CI for RF. The line plot shows the number of positive and negative assays or number of responder and not-responder subjects along the positions in reference protein. Hovering the mouse over any position in the reference protein in any of these plots will display the lower and upper bounds of the RF and number of assays/subjects count found as positive and negative (as shown in red rectangle) For each protein, a table lists all the epitopes, its mapped position, the number of subjects responded/positive assays, the number of subjects tested/assays performed and the RF along with its upper and lower bounds at 95% CI (Fig. 1A). The second step provides the aggregate plot of the mapped RF for each region of the reference protein, in two different plots representing the cumulative RF (upper and lower bound of RF) and total number of results (positive and negative) along the length of the selected reference protein (Fig. 1B). 3 Applications The customized application of the ImmunomeBrowser lends itself to several applications. As mentioned above, Kim et al. has performed a meta-analysis of HCV data available in the IEDB, (Kim et al., 2012; Vita et al., 2015). The tool can now be utilized by users to collate and perform meta-analysis of data generated in multiple related studies. For example, the ImmunomeBrowser can be applied to natural ligand elution data containing largely overlapping peptides, and which are studied in different donors expressing different HLA molecules (Schellens et al., 2015; Shastri et al., 2002). For this purpose, the data needs to be combined for response frequencies from different donors and for each HLA molecule (Alvarez et al., 2018). In this context, Vaughan et al. analyzed naturally processed data curated within the IEDB to characterize the overall general features of the known processed data and to highlight existing knowledge gaps (Vaughan et al., 2017). The Immunomebrowser is also useful to analyze the immunogenicity testing of therapeutic proteins, where the overlapping peptides from a therapeutic protein are tested for immunogenicity to evaluate the unwanted immune response (Asgari et al., 2015; Dhanda et al., 2018; Jawa et al., 2013; Salvat et al., 2017). The Immunomebrowser, can thus aggregate the immune response data from different peptides and/or peptide analogs spanning through the length of the specified reference protein, even when tested in different donors and derived from different clinical studies. This allows users to easily view their data in a more meaningful and useful manner. Acknowledgement The authors acknowledge the support from Jason Greenbaum and Jason Yan at La Jolla Institute for Allergy and Immunology in implementation of the tool. Funding This work was supported by funds from the National Institute of Allergy and Infectious Diseases, National Institute of Health, under Contract No. HHSN272201200010C. Conflict of Interest: none declared. References Alvarez B. et al. ( 2018 ) Computational tools for the identification and interpretation of sequence motifs in immunopeptidomes . Proteomics . doi: 10.1002/pmic.201700252. Asgari S. et al. ( 2015 ) Rational design of stable and functional hirudin III mutants with lower antigenicity . Biologicals , 43 , 479 – 491 . Google Scholar Crossref Search ADS PubMed Dhanda S.K. et al. ( 2018 ) Development of a strategy and computational application to select candidate protein analogues with reduced HLA binding and immunogenicity . Immunology , 153 , 118 – 132 . Google Scholar Crossref Search ADS PubMed Jawa V. et al. ( 2013 ) T-cell dependent immunogenicity of protein therapeutics: preclinical assessment and mitigation . Clin. Immunol ., 149 , 534 – 555 . Google Scholar Crossref Search ADS PubMed Kim Y. et al. ( 2012 ) A meta-analysis of the existing knowledge of immunoreactivity against hepatitis C virus (HCV) . PLoS One , 7 , e38028 . Google Scholar Crossref Search ADS PubMed Li Pira G. et al. ( 2010 ) High throughput T epitope mapping and vaccine development . J. Biomed. Biotechnol ., 2010 , 325720 . Google Scholar Crossref Search ADS PubMed Salvat R.S. et al. ( 2017 ) Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity . Proc. Natl. Acad. Sci. USA , 114 , E5085 – E5093 . Schellens I.M. et al. ( 2015 ) Comprehensive analysis of the naturally processed peptide repertoire: differences between HLA-A and B in the immunopeptidome . PLoS One , 10 , e0136417 . Google Scholar Crossref Search ADS PubMed Shastri N. et al. ( 2002 ) Producing nature’s gene-chips: the generation of peptides for display by MHC class I molecules . Annu. Rev. Immunol ., 20 , 463 – 493 . Google Scholar Crossref Search ADS PubMed Vaughan K. et al. ( 2017 ) Deciphering the MHC-associated peptidome: a review of naturally processed ligand data . Expert. Rev. Proteomics , 14 , 729 – 736 . Google Scholar Crossref Search ADS PubMed Vita R. et al. ( 2015 ) The immune epitope database (IEDB) 3.0 . Nucleic Acids Res ., 43 , D405 – D412 . Google Scholar Crossref Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

ImmunomeBrowser: a tool to aggregate and visualize complex and heterogeneous epitopes in reference proteins

Loading next page...
 
/lp/ou_press/immunomebrowser-a-tool-to-aggregate-and-visualize-complex-and-ean0b7A031
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty463
Publisher site
See Article on Publisher Site

Abstract

Abstract Motivation Datasets that are derived from different studies (e.g. MHC ligand elution, MHC binding, B/T cell epitope screening etc.) often vary in terms of experimental approaches, sizes of peptides tested, including partial and or nested overlapping peptides and in the number of donors tested. Results We present a customized application of the Immune Epitope Database’s ImmunomeBrowser tool, which can be used to effectively aggregate and visualize heterogeneous immunological data. User provided peptide sets and associated response data is mapped to a user-provided protein reference sequence. The output consists of tables and figures representing the aggregated data represented by a Response Frequency score and associated estimated confidence interval. This allows the user to visualizing regions associated with dominant responses and their boundaries. The results are presented both as a user interactive javascript based web interface and a tabular format in a selected reference sequence. Availability and implementation The ‘ImmunomeBrowser’ has been a longstanding feature of the IEDB (http://www.iedb.org). The present application extends the use of this tool to work with user-provided datasets, rather than the output of IEDB queries. This new server version of the ImmunomeBrowser is freely accessible at http://tools.iedb.org/immunomebrowser/. 1 Introduction Datasets that are derived from different studies (e.g. MHC ligand elution, MHC binding, B/T cell epitope screening etc.; Li Pira et al., 2010) will often vary in terms of the experimental approaches used, the size of peptides that were tested and in the number of donors tested. To enable simplified visualization of the aggregated data present in the Immune Epitope Database (IEDB), we implemented the application called the ‘ImmunomeBrowser’ in the IEDB (Vita et al., 2015). This tool aggregates all data relevant to the user query and allows one to visualize the known immune response to a specific antigen, as well as illustrating knowledge gaps in a reference protein. It provides the immune reactivity in terms of response frequency (RF) and the number of subjects tested/responded and/or number of independent assays performed along the length of reference protein. The tool was originally implemented in the results page of the database section of the IEDB. To further extend the usability to predicted epitopes and propriety epitopes or non-IEDB data, the online tool described herein was developed. The utility of the approach was demonstrated by Kim et al. who performed a meta-analysis of Hepatitis C virus (HCV) data available in the IEDB, to present a bigger picture of the immune reactivity and knowledge gaps in the reference protein sequences of the virus (Kim et al., 2012). Currently, the Immunomebrowser can only be used with data derived from IEDB queries, and not with user datasets. To overcome this problem, we implemented the ImmunomeBrowser as a stand-alone tool to allow users to analyze and visualize immunodominant regions within their own dataset. 2 Materials and methods 2.1 Data input The user provides peptide sequences, the response data for each, the protein sequence/s of interest and their desired sequence identity threshold in specified formats. The peptide response can be either pasted or uploaded as a file in whitespace separated format with three columns, corresponding to peptide sequence, number of subjects tested and/or number of assays performed, and the number of subjects responded and/or number of assays resulting in a positive response. In cases, where the number of subjects tested or responded or assays performed are not provided, the program will automatically fill in a value of ‘1’ for the number of subjects tested or assays performed, as well as for the number of subjects responding or positive assays. Protein sequences/s must be provided in ‘Fasta’ format and sequence identity is selected from a drop-down menu that varies from 10–100% with an interval of 10%. 2.2 Mapping of epitopes Each peptide is mapped to a user provided reference protein sequence according to the provided identity threshold. The degree of identity is calculated based on the alignment of the peptide within reference sequence. Only peptides with sequence identity above the threshold are selected for further calculations. 2.3 RF and confidence interval calculations The RF for a given peptide and for each source protein position is calculated as the total number of subjects that responded to that particular peptide and/or independent assay performed for which a positive response was noted (R) divided the total number of subjects tested and/or number of assays performed (N). RF=RN (1) A Confidence interval (CI) is calculated to weight the RF reliability as a function of the number of subjects tested. CI is calculated using the binomial cumulative distribution function and Wilson score. For large sample size (N>=50), lower and upper bound were calculated using following equation. CI=(((R/N)+1.96*1.96/(2*N)±1.96*sqrt(((R/N)*(1-(R/N))+1.96*1.96/(4*N))/N))/(1+1.96*1.96/N)) (2) For small small sample sizes (N < 50), lower and upper bounds are calculated using binomial cumulative distribution function. 2.4 Aggregation of RF data from different overlapping peptides Aggregation of data is required to identify the most frequently recognized epitopes, which can reflect the overall frequency of recognition of peptide sequence containing a given residue. This approach is useful to identify the RF at each position in the reference sequence. To calculate the aggregated RF data, the number of subjects tested and/or assay performed and number of subjects responded and/or number of assays resulted a positive response were summed up for each mapped position in a given source protein. The CI of RF is calculated using the equations described above. 2.5 Result display The results are presented in two steps, where the first step provides the summary of epitopes and assays mapped back to the reference protein sequence (Fig. 1). Fig. 1. View largeDownload slide Screenshots for the example output of the customized application of ‘ImmunomeBrowser’. (A) Tabular format listing all the different epitopes mapped to the given reference protein sequence. (B) Area plot for upper and lower bound CI for RF. The line plot shows the number of positive and negative assays or number of responder and not-responder subjects along the positions in reference protein. Hovering the mouse over any position in the reference protein in any of these plots will display the lower and upper bounds of the RF and number of assays/subjects count found as positive and negative (as shown in red rectangle) Fig. 1. View largeDownload slide Screenshots for the example output of the customized application of ‘ImmunomeBrowser’. (A) Tabular format listing all the different epitopes mapped to the given reference protein sequence. (B) Area plot for upper and lower bound CI for RF. The line plot shows the number of positive and negative assays or number of responder and not-responder subjects along the positions in reference protein. Hovering the mouse over any position in the reference protein in any of these plots will display the lower and upper bounds of the RF and number of assays/subjects count found as positive and negative (as shown in red rectangle) For each protein, a table lists all the epitopes, its mapped position, the number of subjects responded/positive assays, the number of subjects tested/assays performed and the RF along with its upper and lower bounds at 95% CI (Fig. 1A). The second step provides the aggregate plot of the mapped RF for each region of the reference protein, in two different plots representing the cumulative RF (upper and lower bound of RF) and total number of results (positive and negative) along the length of the selected reference protein (Fig. 1B). 3 Applications The customized application of the ImmunomeBrowser lends itself to several applications. As mentioned above, Kim et al. has performed a meta-analysis of HCV data available in the IEDB, (Kim et al., 2012; Vita et al., 2015). The tool can now be utilized by users to collate and perform meta-analysis of data generated in multiple related studies. For example, the ImmunomeBrowser can be applied to natural ligand elution data containing largely overlapping peptides, and which are studied in different donors expressing different HLA molecules (Schellens et al., 2015; Shastri et al., 2002). For this purpose, the data needs to be combined for response frequencies from different donors and for each HLA molecule (Alvarez et al., 2018). In this context, Vaughan et al. analyzed naturally processed data curated within the IEDB to characterize the overall general features of the known processed data and to highlight existing knowledge gaps (Vaughan et al., 2017). The Immunomebrowser is also useful to analyze the immunogenicity testing of therapeutic proteins, where the overlapping peptides from a therapeutic protein are tested for immunogenicity to evaluate the unwanted immune response (Asgari et al., 2015; Dhanda et al., 2018; Jawa et al., 2013; Salvat et al., 2017). The Immunomebrowser, can thus aggregate the immune response data from different peptides and/or peptide analogs spanning through the length of the specified reference protein, even when tested in different donors and derived from different clinical studies. This allows users to easily view their data in a more meaningful and useful manner. Acknowledgement The authors acknowledge the support from Jason Greenbaum and Jason Yan at La Jolla Institute for Allergy and Immunology in implementation of the tool. Funding This work was supported by funds from the National Institute of Allergy and Infectious Diseases, National Institute of Health, under Contract No. HHSN272201200010C. Conflict of Interest: none declared. References Alvarez B. et al. ( 2018 ) Computational tools for the identification and interpretation of sequence motifs in immunopeptidomes . Proteomics . doi: 10.1002/pmic.201700252. Asgari S. et al. ( 2015 ) Rational design of stable and functional hirudin III mutants with lower antigenicity . Biologicals , 43 , 479 – 491 . Google Scholar Crossref Search ADS PubMed Dhanda S.K. et al. ( 2018 ) Development of a strategy and computational application to select candidate protein analogues with reduced HLA binding and immunogenicity . Immunology , 153 , 118 – 132 . Google Scholar Crossref Search ADS PubMed Jawa V. et al. ( 2013 ) T-cell dependent immunogenicity of protein therapeutics: preclinical assessment and mitigation . Clin. Immunol ., 149 , 534 – 555 . Google Scholar Crossref Search ADS PubMed Kim Y. et al. ( 2012 ) A meta-analysis of the existing knowledge of immunoreactivity against hepatitis C virus (HCV) . PLoS One , 7 , e38028 . Google Scholar Crossref Search ADS PubMed Li Pira G. et al. ( 2010 ) High throughput T epitope mapping and vaccine development . J. Biomed. Biotechnol ., 2010 , 325720 . Google Scholar Crossref Search ADS PubMed Salvat R.S. et al. ( 2017 ) Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity . Proc. Natl. Acad. Sci. USA , 114 , E5085 – E5093 . Schellens I.M. et al. ( 2015 ) Comprehensive analysis of the naturally processed peptide repertoire: differences between HLA-A and B in the immunopeptidome . PLoS One , 10 , e0136417 . Google Scholar Crossref Search ADS PubMed Shastri N. et al. ( 2002 ) Producing nature’s gene-chips: the generation of peptides for display by MHC class I molecules . Annu. Rev. Immunol ., 20 , 463 – 493 . Google Scholar Crossref Search ADS PubMed Vaughan K. et al. ( 2017 ) Deciphering the MHC-associated peptidome: a review of naturally processed ligand data . Expert. Rev. Proteomics , 14 , 729 – 736 . Google Scholar Crossref Search ADS PubMed Vita R. et al. ( 2015 ) The immune epitope database (IEDB) 3.0 . Nucleic Acids Res ., 43 , D405 – D412 . Google Scholar Crossref Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

BioinformaticsOxford University Press

Published: Nov 15, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off