Abstract Motivation Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. Results We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. Availability and implementation The webserver and source codes are freely available at https://g2s.genomenexus.org. Contact email@example.com Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction With extensive recent large-scale genome sequencing projects such as The Cancer Genome Atlas (TCGA) (Weinstein et al., 2013), and 1000 Genomes Project (Auton et al., 2015), a great number of germline and somatic genomic variants are being detected. Protein structure changes due to genome variation in protein-coding regions are interesting for genetic marker discovery and interpretation of disease mechanisms. Mapping genomic variants onto the specific 3D protein structures is the first critical step to analyze the variants in the context of protein structures. Several resources have been developed to address the need of mapping protein positions to protein structures. SIFTS (Velankar et al., 2013) maps UniProt (Bateman et al., 2017) entries to Protein Data Bank (PDB) (Berman et al., 2000) entries and provides XML and flat files for download. PDB utilizes SIFTS and provided a user web interface for accessing the mapping (http://www.rcsb.org/pdb/chromosome.do) (Berman et al., 2000; Prlic et al., 2016). G23D (Solomon et al., 2016) also provides a user web interface for mapping genomic variants onto 3D protein structures. However, none of them provides a web API for programmatic access of the sequence alignments and residue mapping. Here, we present G2S, a web API that supports programmatic mapping and annotation of genomic variants on 3D protein structures. The following functionalities were implemented: (i) retrieving protein structure chains aligned to a primary protein sequence (a UniProt/Ensembl entry or a user-defined sequence); (ii) retrieving mapping between genomic positions and structural positions; and (iii) retrieving mapping between amino acid positions and structural positions. G2S provides a RESTful API. The pre-computed alignments are automatically updated weekly to keep up to date with the PDB structure archive. G2S API and source codes are publicly available at https://g2s.genomenexus.org/. 2 G2S pipeline The G2S backend pipeline collects and aligns UniProt and Ensembl protein sequences along with carefully parsed PDB sequences using BioJava (Holland et al., 2008). Raw protein sequences were retrieved from the atom records of PDB files directly to avoid flaws and inconsistences from SEQRES and DBREF records in PDB files. The alignments of the protein sequences against the PDB sequences by BLASTP (Altschul, 1997) were stored in a relational database. The pipeline updates the pre-computed alignments weekly as new PDB structures added to RCSB PDB. The workflow and architecture of G2S API are shown in Supplementary Figure S1. 3 G2S API The G2S API accepts UniProt names, UniProt Isoform names, Human Ensembl names, genomic positions (GRCh37 and 38) and user-defined protein sequences. G2S API returns high confidence pre-calculated mapping of the protein/residue aligned to the protein structures. For user-provided protein sequences, G2S API calculates alignments against structure sequences on the fly. Several alignment quality metrics as E-value and bit-score can be used as parameters in the API request to refine alignment results. G2S API is a RESTful service and all API endpoints provide fast real-time responses in JSON format. The details of API endpoints, use cases and additional design details are provided on the web site (https://g2s.genomenexus.org) and in Supplementary Material. 4 Use cases: cBioPortal and 3DHotspots The cBioPortal for Cancer Genomics (http://cbioportal.org), a widely used resource for studying cancer genomics (Cerami et al., 2012; Gao et al., 2013), utilizes the G2S API to retrieve updated sequence-structure alignments and residue mapping to visualize cancer mutations in protein structures (see Fig. 1 for an example). The G2S API is also being used for detecting 3D mutational hotspots in cancer https://github.com/knowledgesystems/mutationhotspots (Gao et al., 2017). Fig. 1. View largeDownload slide Protein structure visualization of SPOP mutations in the MSK-IMPACT study in the cBioPortal. The mutations were plotted along the primary sequence (up left); alignments from primary sequence to PDB chains were plotted underneath (bottom left); the protein structure of the selected alignment was displayed with mutations highlighted in the structure (right) Fig. 1. View largeDownload slide Protein structure visualization of SPOP mutations in the MSK-IMPACT study in the cBioPortal. The mutations were plotted along the primary sequence (up left); alignments from primary sequence to PDB chains were plotted underneath (bottom left); the protein structure of the selected alignment was displayed with mutations highlighted in the structure (right) 5 Discussion The G2S API provides an auto-updated real-time resource for retrieving residue-level sequence-structure alignments. It fills the critical gap that no existing resources provide programmatic access to up-to-date protein structure alignments and residue mapping. The API was designed with high performance and real-time access in mind so that third-party tools such as cBioPortal can achieve smooth user experience when mapping their variants against up-to-date protein structures, and supporting visualization and analysis of variants in the context of protein structures. Funding This work has been supported by Google Summer of Code 2016 [JW], National Institutes of Health Grant (R33-GM078601 and R01-GM100701) [DX], the Marie-Josée and Henry R. Kravis Center for Molecular Oncology [NS, JG], a National Cancer Institute Cancer Center Core Grant (P30-CA008748), the Fund for Innovation in Cancer Informatics from the Brown Performance Group (www.BrownPerformance.com/ici) [NS, JG] and the Robertson Foundation [NS]. Conflict of Interest: none declared. References Altschul S.F. ( 1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res ., 25, 3389– 3402. Google Scholar CrossRef Search ADS PubMed Auton A. et al. . ( 2015) A global reference for human genetic variation. Nature , 526, 68– 74. Google Scholar CrossRef Search ADS PubMed Bateman A. et al. . ( 2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res ., 45, D158– D169. Google Scholar CrossRef Search ADS PubMed Berman H.M. et al. . ( 2000) The Protein Data Bank. Nucleic Acids Res ., 28, 235– 242. Google Scholar CrossRef Search ADS PubMed Cerami E. et al. . ( 2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov ., 2, 401– 404. Google Scholar CrossRef Search ADS PubMed Gao J. et al. . ( 2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal ., 6, pl1. Google Scholar CrossRef Search ADS PubMed Gao J. et al. . ( 2017) 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med ., 9, 4. Google Scholar CrossRef Search ADS PubMed Holland R.C. et al. . ( 2008) BioJava: an open-source framework for bioinformatics. Bioinformatics (Oxford, England) , 24, 2096– 2097. Google Scholar CrossRef Search ADS PubMed Prlic A. et al. . ( 2016) Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank. Bioinformatics (Oxford, England) , 32, 3833– 3835. Google Scholar CrossRef Search ADS PubMed Solomon O. et al. . ( 2016) G23D: online tool for mapping and visualization of genomic variants on 3D protein structures. BMC Genomics , 17, 681. Google Scholar CrossRef Search ADS PubMed Velankar S. et al. . ( 2013) SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res ., 41, D483– D489. Google Scholar CrossRef Search ADS PubMed Weinstein J.N. et al. . ( 2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet ., 45, 1113– 1120. Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Bioinformatics – Oxford University Press
Published: Jan 27, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera