A risk assessment framework for multidrug-resistant Staphylococcus aureus using machine learning and mass spectrometry technologyWang, Zhuo; Pang, Yuxuan; Chung, Chia-Ru; Wang, Hsin-Yao; Cui, Haiyan; Chiang, Ying-Chih; Horng, Jorng-Tzong; Lu, Jang-Jih; Lee, Tzong-Yi
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad330pmid: 37742050
The emergence of multidrug-resistant bacteria is a critical global crisis that poses a serious threat to public health, particularly with the rise of multidrug-resistant Staphylococcus aureus. Accurate assessment of drug resistance is essential for appropriate treatment and prevention of transmission of these deadly pathogens. Early detection of drug resistance in patients is critical for providing timely treatment and reducing the spread of multidrug-resistant bacteria. This study aims to develop a novel risk assessment framework for S. aureus that can accurately determine the resistance to multiple antibiotics. The comprehensive 7-year study involved ˃20 000 isolates with susceptibility testing profiles of six antibiotics. By incorporating mass spectrometry and machine learning, the study was able to predict the susceptibility to four different antibiotics with high accuracy. To validate the accuracy of our models, we externally tested on an independent cohort and achieved impressive results with an area under the receiver operating characteristic curve of 0. 94, 0.90, 0.86 and 0.91, and an area under the precision–recall curve of 0.93, 0.87, 0.87 and 0.81, respectively, for oxacillin, clindamycin, erythromycin and trimethoprim-sulfamethoxazole. In addition, the framework evaluated the level of multidrug resistance of the isolates by using the predicted drug resistance probabilities, interpreting them in the context of a multidrug resistance risk score and analyzing the performance contribution of different sample groups. The results of this study provide an efficient method for early antibiotic decision-making and a better understanding of the multidrug resistance risk of S. aureus.
Mutational signature assignment heterogeneity is widespread and can be addressed by ensemble approachesWu, Andy J; Perera, Akila; Kularatnarajah, Linganesan; Korsakova, Anna; Pitt, Jason J
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad331pmid: 37742051
Single-base substitution (SBS) mutational signatures have become standard practice in cancer genomics. In lieu of de novo signature extraction, reference signature assignment allows users to estimate the activities of pre-established SBS signatures within individual malignancies. Several tools have been developed for this purpose, each with differing methodologies. However, due to a lack of standardization, there may be inter-tool variability in signature assignment. We deeply characterized three assignment strategies and five SBS signature assignment tools. We observed that assignment strategy choice can significantly influence results and interpretations. Despite varying recommendations by tools, Refit performed best by reducing overfitting and maximizing reconstruction of the original mutational spectra. Even after uniform application of Refit, tools varied remarkably in signature assignments both qualitatively (Jaccard index = 0.38–0.83) and quantitatively (Kendall tau-b = 0.18–0.76). This phenomenon was exacerbated for ‘flat’ signatures such as the homologous recombination deficiency signature SBS3. An ensemble approach (EnsembleFit), which leverages output from all five tools, increased SBS3 assignment accuracy in BRCA1/2-deficient breast carcinomas. After generating synthetic mutational profiles for thousands of pan-cancer tumors, EnsembleFit reduced signature activity assignment error 15.9–24.7% on average using Catalogue of Somatic Mutations In Cancer and non-standard reference signature sets. We have also released the EnsembleFit web portal (https://www.ensemblefit.pittlabgenomics.com) for users to generate or download ensemble-based SBS signature assignments using any strategy and combination of tools. Overall, we show that signature assignment heterogeneity across tools and strategies is non-negligible and propose a viable, ensemble solution.
Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populationsLei, Tianyuan; Chen, Ruoyu; Zhang, Shaoqiang; Chen, Yong
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad335pmid: 37769630
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
MMiKG: a knowledge graph-based platform for path mining of microbiota–mental diseases interactionsSun, Haoran; Song, Zhaoqi; Chen, Qiuming; Wang, Meiling; Tang, Furong; Dou, Lijun; Zou, Quan; Yang, Fenglong
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad340pmid: 37779250
The microbiota–gut–brain axis denotes a two-way system of interactions between the gut and the brain, comprising three key components: (1) gut microbiota, (2) intermediates and (3) mental ailments. These constituents communicate with one another to induce changes in the host’s mood, cognition and demeanor. Knowledge concerning the regulation of the host central nervous system by gut microbiota is fragmented and mostly confined to disorganized or semi-structured unrestricted texts. Such a format hinders the exploration and comprehension of unknown territories or the further advancement of artificial intelligence systems. Hence, we collated crucial information by scrutinizing an extensive body of literature, amalgamated the extant knowledge of the microbiota–gut–brain axis and depicted it in the form of a knowledge graph named MMiKG, which can be visualized on the GraphXR platform and the Neo4j database, correspondingly. By merging various associated resources and deducing prospective connections between gut microbiota and the central nervous system through MMiKG, users can acquire a more comprehensive perception of the pathogenesis of mental disorders and generate novel insights for advancing therapeutic measures. As a free and open-source platform, MMiKG can be accessed at http://yangbiolab.cn:8501/ with no login requirement.
ScSmOP: a universal computational pipeline for single-cell single-molecule multiomics data analysisJing, Kai; Xu, Yewen; Yang, Yang; Yin, Pengfei; Ning, Duo; Huang, Guangyu; Deng, Yuqing; Chen, Gengzhan; Li, Guoliang; Tian, Simon Zhongyuan; Zheng, Meizhen
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad343pmid: 37779245
Single-cell multiomics techniques have been widely applied to detect the key signature of cells. These methods have achieved a single-molecule resolution and can even reveal spatial localization. These emerging methods provide insights elucidating the features of genomic, epigenomic and transcriptomic heterogeneity in individual cells. However, they have given rise to new computational challenges in data processing. Here, we describe Single-cell Single-molecule multiple Omics Pipeline (ScSmOP), a universal pipeline for barcode-indexed single-cell single-molecule multiomics data analysis. Essentially, the C language is utilized in ScSmOP to set up spaced-seed hash table-based algorithms for barcode identification according to ligation-based barcoding data and synthesis-based barcoding data, followed by data mapping and deconvolution. We demonstrate high reproducibility of data processing between ScSmOP and published pipelines in comprehensive analyses of single-cell omics data (scRNA-seq, scATAC-seq, scARC-seq), single-molecule chromatin interaction data (ChIA-Drop, SPRITE, RD-SPRITE), single-cell single-molecule chromatin interaction data (scSPRITE) and spatial transcriptomic data from various cell types and species. Additionally, ScSmOP shows more rapid performance and is a versatile, efficient, easy-to-use and robust pipeline for single-cell single-molecule multiomics data analysis.
PTBGRP: predicting phage–bacteria interactions with graph representation learning on microbial heterogeneous information networkPan, Jie; You, Zhuhong; You, Wencai; Zhao, Tian; Feng, Chenlu; Zhang, Xuexia; Ren, Fengzhi; Ma, Sanxing; Wu, Fan; Wang, Shiwei; Sun, Yanmei
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad328pmid: 37742053
Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)–based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage–bacteria interaction (PBI) and six bacteria–bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.
Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activitiesZhang, Guishan; Luo, Ye; Dai, Xianhua; Dai, Zhiming
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad333
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
A fast and accurate method for SARS-CoV-2 genomic tracingMa, Wentai; Shi, Leisheng; Li, Mingkun
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad339pmid: 37779249
To contain infectious diseases, it is crucial to determine the origin and transmission routes of the pathogen, as well as how the virus evolves. With the development of genome sequencing technology, genome epidemiology has emerged as a powerful approach for investigating the source and transmission of pathogens. In this study, we first presented the rationale for genomic tracing of SARS-CoV-2 and the challenges we currently face. Identifying the most genetically similar reference sequence to the query sequence is a critical step in genome tracing, typically achieved using either a phylogenetic tree or a sequence similarity search. However, these methods become inefficient or computationally prohibitive when dealing with tens of millions of sequences in the reference database, as we encountered during the COVID-19 pandemic. To address this challenge, we developed a novel genomic tracing algorithm capable of processing 6 million SARS-CoV-2 sequences in less than a minute. Instead of constructing a giant phylogenetic tree, we devised a weighted scoring system based on mutation characteristics to quantify sequences similarity. The developed method demonstrated superior performance compared to previous methods. Additionally, an online platform was developed to facilitate genomic tracing and visualization of the spatiotemporal distribution of sequences. The method will be a valuable addition to standard epidemiological investigations, enabling more efficient genomic tracing. Furthermore, the computational framework can be easily adapted to other pathogens, paving the way for routine genomic tracing of infectious diseases.
HTCL-DDI: a hierarchical triple-view contrastive learning framework for drug–drug interaction predictionZhang, Ran; Wang, Xuezhi; Wang, Pengfei; Meng, Zhen; Cui, Wenjuan; Zhou, Yuanchun
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad324pmid: 37742052
Drug–drug interaction (DDI) prediction can discover potential risks of drug combinations in advance by detecting drug pairs that are likely to interact with each other, sparking an increasing demand for computational methods of DDI prediction. However, existing computational DDI methods mostly rely on the single-view paradigm, failing to handle the complex features and intricate patterns of DDIs due to the limited expressiveness of the single view. To this end, we propose a Hierarchical Triple-view Contrastive Learning framework for Drug–Drug Interaction prediction (HTCL-DDI), leveraging the molecular, structural and semantic views to model the complicated information involved in DDI prediction. To aggregate the intra-molecular compositional and structural information, we present a dual attention-aware network in the molecular view. Based on the molecular view, to further capture inter-molecular information, we utilize the one-hop neighboring information and high-order semantic relations in the structural view and semantic view, respectively. Then, we introduce contrastive learning to enhance drug representation learning from multifaceted aspects and improve the robustness of HTCL-DDI. Finally, we conduct extensive experiments on three real-world datasets. All the experimental results show the significant improvement of HTCL-DDI over the state-of-the-art methods, which also demonstrates that HTCL-DDI opens new avenues for ensuring medication safety and identifying synergistic drug combinations.
Fuse feeds as one: cross-modal framework for general identification of AMPsZhang, Wentao; Xu, Yanchao; Wang, Aowen; Chen, Gang; Zhao, Junbo
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad336pmid: 37779248
Antimicrobial peptides (AMPs) are promising candidates for the development of new antibiotics due to their broad-spectrum activity against a range of pathogens. However, identifying AMPs through a huge bunch of candidates is challenging due to their complex structures and diverse sequences. In this study, we propose SenseXAMP, a cross-modal framework that leverages semantic embeddings of and protein descriptors (PDs) of input sequences to improve the identification performance of AMPs. SenseXAMP includes a multi-input alignment module and cross-representation fusion module to explore the hidden information between the two input features and better leverage the fusion feature. To better address the AMPs identification task, we accumulate the latest annotated AMPs data to form more generous benchmark datasets. Additionally, we expand the existing AMPs identification task settings by adding an AMPs regression task to meet more specific requirements like antimicrobial activity prediction. The experimental results indicated that SenseXAMP outperformed existing state-of-the-art models on multiple AMP-related datasets including commonly used AMPs classification datasets and our proposed benchmark datasets. Furthermore, we conducted a series of experiments to demonstrate the complementary nature of traditional PDs and protein pre-training models in AMPs tasks. Our experiments reveal that SenseXAMP can effectively combine the advantages of PDs to improve the performance of protein pre-training models in AMPs tasks.
Cracking the pattern of tumor evolution based on single-cell copy number alterationsWang, Ying; Zhang, Min; Shi, Jian; Zhu, Yue; Wang, Xin; Zhang, Shaojun; Wang, Fang
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad341
Copy number alterations (CNAs) are a key characteristic of tumor development and progression. The accumulation of various CNAs during tumor development plays a critical role in driving tumor evolution. Heterogeneous clones driven by distinct CNAs have different selective advantages, leading to differential patterns of tumor evolution that are essential for developing effective cancer therapies. Recent advances in single-cell sequencing technology have enabled genome-wide copy number profiling of tumor cell populations at single-cell resolution. This has made it possible to explore the evolutionary patterns of CNAs and accurately discover the mechanisms of intra-tumor heterogeneity. Here, we propose a two-step statistical approach that distinguishes neutral, linear, branching and punctuated evolutionary patterns for a tumor cell population based on single-cell copy number profiles. We assessed our approach using a variety of simulated and real single-cell genomic and transcriptomic datasets, demonstrating its high accuracy and robustness in predicting tumor evolutionary patterns. We applied our approach to single-cell DNA sequencing data from 20 breast cancer patients and observed that punctuated evolution is the dominant evolutionary pattern in breast cancer. Similar conclusions were drawn when applying the approach to single-cell RNA sequencing data obtained from 132 various cancer patients. Moreover, we found that differential immune cell infiltration is associated with specific evolutionary patterns. The source code of our study is available at https://github.com/FangWang-SYSU/PTEM.
Microbiome Metabolome Integration Platform (MMIP): a web-based platform for microbiome and metabolome data integration and feature identificationGautam, Anupam; Bhowmik, Debaleena; Basu, Sayantani; Zeng, Wenhuan; Lahiri, Abhishake; Huson, Daniel H; Paul, Sandip
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad325pmid: 37771003
A microbial community maintains its ecological dynamics via metabolite crosstalk. Hence, knowledge of the metabolome, alongside its populace, would help us understand the functionality of a community and also predict how it will change in atypical conditions. Methods that employ low-cost metagenomic sequencing data can predict the metabolic potential of a community, that is, its ability to produce or utilize specific metabolites. These, in turn, can potentially serve as markers of biochemical pathways that are associated with different communities. We developed MMIP (Microbiome Metabolome Integration Platform), a web-based analytical and predictive tool that can be used to compare the taxonomic content, diversity variation and the metabolic potential between two sets of microbial communities from targeted amplicon sequencing data. MMIP is capable of highlighting statistically significant taxonomic, enzymatic and metabolic attributes as well as learning-based features associated with one group in comparison with another. Furthermore, MMIP can predict linkages among species or groups of microbes in the community, specific enzyme profiles, compounds or metabolites associated with such a group of organisms. With MMIP, we aim to provide a user-friendly, online web server for performing key microbiome-associated analyses of targeted amplicon sequencing data, predicting metabolite signature, and using learning-based linkage analysis, without the need for initial metabolomic analysis, and thereby helping in hypothesis generation.
Attention-based generative adversarial networks improve prognostic outcome prediction of cancer from multimodal dataShi, Mingguang; Li, Xuefeng; Li, Mingna; Si, Yichong
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad329pmid: 37756592
The prediction of prognostic outcome is critical for the development of efficient cancer therapeutics and potential personalized medicine. However, due to the heterogeneity and diversity of multimodal data of cancer, data integration and feature selection remain a challenge for prognostic outcome prediction. We proposed a deep learning method with generative adversarial network based on sequential channel-spatial attention modules (CSAM-GAN), a multimodal data integration and feature selection approach, for accomplishing prognostic stratification tasks in cancer. Sequential channel-spatial attention modules equipped with an encoder–decoder are applied for the input features of multimodal data to accurately refine selected features. A discriminator network was proposed to make the generator and discriminator learning in an adversarial way to accurately describe the complex heterogeneous information of multiple modal data. We conducted extensive experiments with various feature selection and classification methods and confirmed that the CSAM-GAN via the multilayer deep neural network (DNN) classifier outperformed these baseline methods on two different multimodal data sets with miRNA expression, mRNA expression and histopathological image data: lower-grade glioma and kidney renal clear cell carcinoma. The CSAM-GAN via the multilayer DNN classifier bridges the gap between heterogenous multimodal data and prognostic outcome prediction.
GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq dataFiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Gaglio, Salvatore; Urso, Alfonso
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad332pmid: 37756593
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
3D-SMGE: a pipeline for scaffold-based molecular generation and evaluationXu, Chao; Liu, Runduo; Huang, Shuheng; Li, Wenchao; Li, Zhe; Luo, Hai-Bin
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad327pmid: 37756591
In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.
TransIntegrator: capture nearly full protein-coding transcript variants via integrating Illumina and PacBio transcriptomesLin, Zhe; Qin, Yangmei; Chen, Hao; Shi, Dan; Zhong, Mindong; An, Te; Chen, Linshan; Wang, Yiquan; Lin, Fan; Li, Guang; Ji, Zhi-Liang
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad334pmid: 37779246
Genes have the ability to produce transcript variants that perform specific cellular functions. However, accurately detecting all transcript variants remains a long-standing challenge, especially when working with poorly annotated genomes or without a known genome. To address this issue, we have developed a new computational method, TransIntegrator, which enables transcriptome-wide detection of novel transcript variants. For this, we determined 10 Illumina sequencing transcriptomes and a PacBio full-length transcriptome for consecutive embryo development stages of amphioxus, a species of great evolutionary importance. Based on the transcriptomes, we employed TransIntegrator to create a comprehensive transcript variant library, namely iTranscriptome. The resulting iTrancriptome contained 91 915 distinct transcript variants, with an average of 2.4 variants per gene. This substantially improved current amphioxus genome annotation by expanding the number of genes from 21 954 to 38 777. Further analysis manifested that the gene expansion was largely ascribed to integration of multiple Illumina datasets instead of involving the PacBio data. Moreover, we demonstrated an example application of TransIntegrator, via generating iTrancriptome, in aiding accurate transcriptome assembly, which significantly outperformed other hybrid methods such as IDP-denovo and Trinity. For user convenience, we have deposited the source codes of TransIntegrator on GitHub as well as a conda package in Anaconda. In summary, this study proposes an affordable but efficient method for reliable transcriptomic research in most species.
Spatom: a graph neural network for structure-based protein–protein interaction site predictionWu, Haonan; Han, Jiyun; Zhang, Shizhuo; Xin, Gaojia; Mou, Chaozhou; Liu, Juntao
2023 Briefings in Bioinformatics
doi: 10.1093/bib/bbad345pmid: 37779247
Accurate identification of protein–protein interaction (PPI) sites remains a computational challenge. We propose Spatom, a novel framework for PPI site prediction. This framework first defines a weighted digraph for a protein structure to precisely characterize the spatial contacts of residues, then performs a weighted digraph convolution to aggregate both spatial local and global information and finally adds an improved graph attention layer to drive the predicted sites to form more continuous region(s). Spatom was tested on a diverse set of challenging protein–protein complexes and demonstrated the best performance among all the compared methods. Furthermore, when tested on multiple popular proteins in a case study, Spatom clearly identifies the interaction interfaces and captures the majority of hotspots. Spatom is expected to contribute to the understanding of protein interactions and drug designs targeting protein binding.