Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvementGao, Yan; Cui, Yan
doi: 10.1186/s13073-024-01345-0pmid: 38835075
BackgroundAccurate prediction of an individual’s predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.MethodsWe introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer’s disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.ResultsDeep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.ConclusionsThis study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
Race-specific coregulatory and transcriptomic profiles associated with DNA methylation and androgen receptor in prostate cancerRamakrishnan, Swathi; Cortes-Gomez, Eduardo; Athans, Sarah R.; Attwood, Kristopher M.; Rosario, Spencer R.; Kim, Se Jin; Mager, Donald E.; Isenhart, Emily G.; Hu, Qiang; Wang, Jianmin; Woloszynska, Anna
doi: 10.1186/s13073-024-01323-6pmid: 38566104
BackgroundProstate cancer is a significant health concern, particularly among African American (AA) men who exhibit higher incidence and mortality compared to European American (EA) men. Understanding the molecular mechanisms underlying these disparities is imperative for enhancing clinical management and achieving better outcomes.MethodsEmploying a multi-omics approach, we analyzed prostate cancer in both AA and EA men. Using Illumina methylation arrays and RNA sequencing, we investigated DNA methylation and gene expression in tumor and non-tumor prostate tissues. Additionally, Boolean analysis was utilized to unravel complex networks contributing to racial disparities in prostate cancer.ResultsWhen comparing tumor and adjacent non-tumor prostate tissues, we found that DNA hypermethylated regions are enriched for PRC2/H3K27me3 pathways and EZH2/SUZ12 cofactors. Olfactory/ribosomal pathways and distinct cofactors, including CTCF and KMT2A, were enriched in DNA hypomethylated regions in prostate tumors from AA men. We identified race-specific inverse associations of DNA methylation with expression of several androgen receptor (AR) associated genes, including the GATA family of transcription factors and TRIM63. This suggests that race-specific dysregulation of the AR signaling pathway exists in prostate cancer. To investigate the effect of AR inhibition on race-specific gene expression changes, we generated in-silico patient-specific prostate cancer Boolean networks. Our simulations revealed prolonged AR inhibition causes significant dysregulation of TGF-β, IDH1, and cell cycle pathways specifically in AA prostate cancer. We further quantified global gene expression changes, which revealed differential expression of genes related to microtubules, immune function, and TMPRSS2-fusion pathways, specifically in prostate tumors of AA men. Enrichment of these pathways significantly correlated with an altered risk of disease progression in a race-specific manner.ConclusionsOur study reveals unique signaling networks underlying prostate cancer biology in AA and EA men, offering potential insights for clinical management strategies tailored to specific racial groups. Targeting AR and associated pathways could be particularly beneficial in addressing the disparities observed in prostate cancer outcomes in the context of AA and EA men. Further investigation into these identified pathways may lead to the development of personalized therapeutic approaches to improve outcomes for prostate cancer patients across different racial backgrounds.
Circular RNA landscape in extracellular vesicles from human biofluidsZhao, Jingjing; Li, Qiaojuan; Hu, Jia; Yu, Hongwu; Shen, Youmin; Lai, Hongyan; Li, Qin; Zhang, Hena; Li, Yan; Fang, Zhuting; Huang, Shenglin
doi: 10.1186/s13073-024-01400-wpmid: 39482783
BackgroundCircular RNAs (circRNAs) have emerged as a prominent class of covalently closed single-stranded RNA molecules that exhibit tissue-specific expression and potential as biomarkers in extracellular vesicles (EVs) derived from liquid biopsies. Still, their characteristics and applications in EVs remain to be unveiled.MethodsWe performed a comprehensive analysis of EV-derived circRNAs (EV-circRNAs) using transcriptomics data obtained from 1082 human body fluids, including plasma, urine, cerebrospinal fluid (CSF), and bile. Our validation strategy utilized RT-qPCR and RNA immunoprecipitation assays, complemented by computational techniques for analyzing EV-circRNA features and RNA-binding protein interactions.ResultsWe identified 136,327 EV-circRNAs from various human body fluids. Significantly, a considerable amount of circRNAs with a high back-splicing ratio are highly enriched in EVs compared to linear RNAs. Additionally, we discovered brain-specific circRNAs enriched in plasma EVs and cancer-associated EV-circRNAs linked to clinical outcomes. Moreover, we demonstrated that EV-circRNAs have the potential to serve as biomarkers for evaluating immunotherapy efficacy in non-small cell lung cancer (NSCLC). Importantly, we identified the involvement of RBPs, particularly YBX1, in the sorting mechanism of circRNAs into EVs.ConclusionsThis study unveils the extensive repertoire of EV-circRNAs across human biofluids, offering insights into their potential as disease biomarkers and their mechanistic roles within EVs. The identification of specific circRNAs and the elucidation of RBP-mediated sorting mechanisms open new avenues for the clinical application of EV-circRNAs in disease diagnostics and therapeutics.
A genome-based survey of invasive pneumococci in Norway over four decades reveals lineage-specific responses to vaccinationEldholm, Vegard; Osnes, Magnus N.; Bjørnstad, Martha L.; Straume, Daniel; Gladstone, Rebecca A.
doi: 10.1186/s13073-024-01396-3pmid: 39456053
BackgroundStreptococcus pneumoniae is a major cause of mortality globally. The introduction of pneumococcal conjugate vaccines (PCVs) has reduced the incidence of the targeted serotypes significantly, but expansion of non-targeted serotypes, serotype replacement, and incomplete vaccine-targeting contribute to pneumococcal disease in the vaccine era. Here, we characterize the changing population genetic landscape of S. pneumoniae in Norway over a 41-year period (1982–2022).MethodsSince 2018, all cases of invasive pneumococcal disease have undergone whole-genome sequencing (WGS) at the Norwegian Institute of Public Health. In order to characterize the changing population over time, historical isolates were re-cultured and sequenced, resulting in a historical WGS dataset. Isolates were assigned to global pneumococcal sequence clusters (GPSCs) using PathogenWatch and assigned to serotypes using in silico (SeroBA) and in vitro methods (Quellung reaction). Temporal phylogenetic analyses were performed on GPSCs of particular interest.ResultsThe availability of WGS data allowed us to study capsular variation at the level of individual lineages. We detect highly divergent fates for different GPSCs following the introduction of PCVs. For two out of eight major GPSCs, we identified multiple instances of serotype switching from vaccine types to non-vaccine types. Dating analyses suggest that most instances of serotype switching predated the introduction of PCVs, but expansion occurred after their introduction. Furthermore, selection for penicillin non-susceptibility was not a driving force for the changing serotype distribution within the GPSCs over time.ConclusionsPCVs have been major shapers of the Norwegian disease-causing pneumococcal population, both at the level of serotype distributions and the underlying lineage dynamics. Overall, the introduction of PCVs has reduced the incidence of invasive disease. However, some GPSCs initially dominated by vaccine types escaped the effect of vaccination through expansion of non-vaccine serotypes. Close monitoring of circulating lineages and serotypes will be key for ensuring optimal vaccination coverage going forward.
Variability of polygenic prediction for body mass index in AfricaChikowore, Tinashe; Läll, Kristi; Micklesfield, Lisa K.; Lombard, Zane; Goedecke, Julia H.; Fatumo, Segun; Norris, Shane A.; Magi, Reedik; Ramsay, Michele; Franks, Paul W.; Pare, Guillaume; Morris, Andrew P.
doi: 10.1186/s13073-024-01348-xpmid: 38816834
BackgroundPolygenic prediction studies in continental Africans are scarce. Africa’s genetic and environmental diversity pose a challenge that limits the generalizability of polygenic risk scores (PRS) for body mass index (BMI) within the continent. Studies to understand the factors that affect PRS variability within Africa are required.MethodsUsing the first multi-ancestry genome-wide association study (GWAS) meta-analysis for BMI involving continental Africans, we derived a multi-ancestry PRS and compared its performance to a European ancestry-specific PRS in continental Africans (AWI-Gen study) and a European cohort (Estonian Biobank). We then evaluated the factors affecting the performance of the PRS in Africans which included fine-mapping resolution, allele frequencies, linkage disequilibrium patterns, and PRS-environment interactions.ResultsPolygenic prediction of BMI in continental Africans is poor compared to that in European ancestry individuals. However, we show that the multi-ancestry PRS is more predictive than the European ancestry-specific PRS due to its improved fine-mapping resolution. We noted regional variation in polygenic prediction across Africa’s East, South, and West regions, which was driven by a complex interplay of the PRS with environmental factors, such as physical activity, smoking, alcohol intake, and socioeconomic status.ConclusionsOur findings highlight the role of gene-environment interactions in PRS prediction variability in Africa. PRS methods that correct for these interactions, coupled with the increased representation of Africans in GWAS, may improve PRS prediction in Africa.