Widespread and tissue-specific expression of endogenous retroelements in human somatic tissuesLarouche, Jean-David; Trofimov, Assya; Hesnard, Leslie; Ehx, Gregory; Zhao, Qingchuan; Vincent, Krystel; Durette, Chantal; Gendron, Patrick; Laverdure, Jean-Philippe; Bonneil, Éric; Côté, Caroline; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude
doi: 10.1186/s13073-020-00740-7pmid: 32345368
BackgroundEndogenous retroelements (EREs) constitute about 42% of the human genome and have been implicated in common human diseases such as autoimmunity and cancer. The dominant paradigm holds that EREs are expressed in embryonic stem cells (ESCs) and germline cells but are repressed in differentiated somatic cells. Despite evidence that some EREs can be expressed at the RNA and protein levels in specific contexts, a system-level evaluation of their expression in human tissues is lacking.MethodsUsing RNA sequencing data, we analyzed ERE expression in 32 human tissues and cell types, including medullary thymic epithelial cells (mTECs). A tissue specificity index was computed to identify tissue-restricted ERE families. We also analyzed the transcriptome of mTECs in wild-type and autoimmune regulator (AIRE)-deficient mice. Finally, we developed a proteogenomic workflow combining RNA sequencing and mass spectrometry (MS) in order to evaluate whether EREs might be translated and generate MHC I-associated peptides (MAP) in B-lymphoblastoid cell lines (B-LCL) from 16 individuals.ResultsWe report that all human tissues express EREs, but the breadth and magnitude of ERE expression are very heterogeneous from one tissue to another. ERE expression was particularly high in two MHC I-deficient tissues (ESCs and testis) and one MHC I-expressing tissue, mTECs. In mutant mice, we report that the exceptional expression of EREs in mTECs was AIRE-independent. MS analyses identified 103 non-redundant ERE-derived MAPs (ereMAPs) in B-LCLs. These ereMAPs preferentially derived from sense translation of intronic EREs. Notably, detailed analyses of their amino acid composition revealed that ERE-derived MAPs presented homology to viral MAPs.ConclusionsThis study shows that ERE expression in somatic tissues is more pervasive and heterogeneous than anticipated. The high and diversified expression of EREs in mTECs and their ability to generate MAPs suggest that EREs may play an important role in the establishment of self-tolerance. The viral-like properties of ERE-derived MAPs suggest that those not expressed in mTECs can be highly immunogenic.
Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopesPoran, Asaf; Harjanto, Dewi; Malloy, Matthew; Arieta, Christina M.; Rothenberg, Daniel A.; Lenkala, Divya; van Buuren, Marit M.; Addona, Terri A.; Rooney, Michael S.; Srinivasan, Lakshmi; Gaynor, Richard B.
doi: 10.1186/s13073-020-00767-wpmid: 32791978
BackgroundThe ongoing COVID-19 pandemic has created an urgency to identify novel vaccine targets for protective immunity against SARS-CoV-2. Early reports identify protective roles for both humoral and cell-mediated immunity for SARS-CoV-2.MethodsWe leveraged our bioinformatics binding prediction tools for human leukocyte antigen (HLA)-I and HLA-II alleles that were developed using mass spectrometry-based profiling of individual HLA-I and HLA-II alleles to predict peptide binding to diverse allele sets. We applied these binding predictors to viral genomes from the Coronaviridae family and specifically focused on T cell epitopes from SARS-CoV-2 proteins. We assayed a subset of these epitopes in a T cell induction assay for their ability to elicit CD8+ T cell responses.ResultsWe first validated HLA-I and HLA-II predictions on Coronaviridae family epitopes deposited in the Virus Pathogen Database and Analysis Resource (ViPR) database. We then utilized our HLA-I and HLA-II predictors to identify 11,897 HLA-I and 8046 HLA-II candidate peptides which were highly ranked for binding across 13 open reading frames (ORFs) of SARS-CoV-2. These peptides are predicted to provide over 99% allele coverage for the US, European, and Asian populations. From our SARS-CoV-2-predicted peptide-HLA-I allele pairs, 374 pairs identically matched what was previously reported in the ViPR database, originating from other coronaviruses with identical sequences. Of these pairs, 333 (89%) had a positive HLA binding assay result, reinforcing the validity of our predictions. We then demonstrated that a subset of these highly predicted epitopes were immunogenic based on their recognition by specific CD8+ T cells in healthy human donor peripheral blood mononuclear cells (PBMCs). Finally, we characterized the expression of SARS-CoV-2 proteins in virally infected cells to prioritize those which could be potential targets for T cell immunity.ConclusionsUsing our bioinformatics platform, we identify multiple putative epitopes that are potential targets for CD4+ and CD8+ T cells, whose HLA binding properties cover nearly the entire population. We also confirm that our binding predictors can predict epitopes eliciting CD8+ T cell responses from multiple SARS-CoV-2 proteins. Protein expression and population HLA allele coverage, combined with the ability to identify T cell epitopes, should be considered in SARS-CoV-2 vaccine design strategies and immune monitoring.
Whole-genome sequence association analysis of blood proteins in a longitudinal wellness cohortZhong, Wen; Gummesson, Anders; Tebani, Abdellah; Karlsson, Max J.; Hong, Mun-Gwan; Schwenk, Jochen M.; Edfors, Fredrik; Bergström, Göran; Fagerberg, Linn; Uhlén, Mathias
doi: 10.1186/s13073-020-00755-0pmid: 32576278
BackgroundThe human plasma proteome is important for many biological processes and targets for diagnostics and therapy. It is therefore of great interest to understand the interplay of genetic and environmental factors to determine the specific protein levels in individuals and to gain a deeper insight of the importance of genetic architecture related to the individual variability of plasma levels of proteins during adult life.MethodsWe have combined whole-genome sequencing, multiplex plasma protein profiling, and extensive clinical phenotyping in a longitudinal 2-year wellness study of 101 healthy individuals with repeated sampling. Analyses of genetic and non-genetic associations related to the variability of blood levels of proteins in these individuals were performed.ResultsThe analyses showed that each individual has a unique protein profile, and we report on the intra-individual as well as inter-individual variation for 794 plasma proteins. A genome-wide association study (GWAS) using 7.3 million genetic variants identified by whole-genome sequencing revealed 144 independent variants across 107 proteins that showed strong association (P < 6 × 10−11) between genetics and the inter-individual variability on protein levels. Many proteins not reported before were identified (67 out of 107) with individual plasma level affected by genetics. Our longitudinal analysis further demonstrates that these levels are stable during the 2-year study period. The variability of protein profiles as a consequence of environmental factors was also analyzed with focus on the effects of weight loss and infections.ConclusionsWe show that the adult blood levels of many proteins are determined at birth by genetics, which is important for efforts aimed to understand the relationship between plasma proteome profiles and human biology and disease.
Statistical power in COVID-19 case-control host genomic study designLin, Yu-Chung; Brooks, Jennifer D.; Bull, Shelley B.; Gagnon, France; Greenwood, Celia M. T.; Hung, Rayjean J.; Lawless, Jerald; Paterson, Andrew D.; Sun, Lei; Strug, Lisa J.; ,
doi: 10.1186/s13073-020-00818-2pmid: 33371892
The identification of genetic variation that directly impacts infection susceptibility to SARS-CoV-2 and disease severity of COVID-19 is an important step towards risk stratification, personalized treatment plans, therapeutic, and vaccine development and deployment. Given the importance of study design in infectious disease genetic epidemiology, we use simulation and draw on current estimates of exposure, infectivity, and test accuracy of COVID-19 to demonstrate the feasibility of detecting host genetic factors associated with susceptibility and severity in published COVID-19 study designs. We demonstrate that limited phenotypic data and exposure/infection information in the early stages of the pandemic significantly impact the ability to detect most genetic variants with moderate effect sizes, especially when studying susceptibility to SARS-CoV-2 infection. Our insights can aid in the interpretation of genetic findings emerging in the literature and guide the design of future host genetic studies.
Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clusteringArora, Arshi; Olshen, Adam B.; Seshan, Venkatraman E.; Shen, Ronglai
doi: 10.1186/s13073-020-00804-8pmid: 33272320
BackgroundComprehensive molecular profiling has revealed somatic variations in cancer at genomic, epigenomic, transcriptomic, and proteomic levels. The accumulating data has shown clearly that molecular phenotypes of cancer are complex and influenced by a multitude of factors. Conventional unsupervised clustering applied to a large patient population is inevitably driven by the dominant variation from major factors such as cell-of-origin or histology. Translation of these data into clinical relevance requires more effective extraction of information directly associated with patient outcome.MethodsDrawing from ideas in supervised text classification, we developed survClust, an outcome-weighted clustering algorithm for integrative molecular stratification focusing on patient survival. survClust was performed on 18 cancer types across multiple data modalities including somatic mutation, DNA copy number, DNA methylation, and mRNA, miRNA, and protein expression from the Cancer Genome Atlas study to identify novel prognostic subtypes.ResultsOur analysis identified the prognostic role of high tumor mutation burden with concurrently high CD8 T cell immune marker expression and the aggressive clinical behavior associated with CDKN2A deletion across cancer types. Visualization of somatic alterations, at a genome-wide scale (total mutation burden, mutational signature, fraction genome altered) and at the individual gene level, using circomap further revealed indolent versus aggressive subgroups in a pan-cancer setting.ConclusionsOur analysis has revealed prognostic molecular subtypes not previously identified by unsupervised clustering. The algorithm and tools we developed have direct utility toward patient stratification based on tumor genomics to inform clinical decision-making. The survClust software tool is available at https://github.com/arorarshi/survClust.