TY - JOUR AU1 - Datta, Shrayasi AU2 - Ghosh, Chinmoy AU3 - Choudhury, J. Pal AB - Pharmacological datasets like Yeast and Escherichia coli (E. coli) have a massive impact on the healthcare industry for the production of human drugs. Yeast is well recognized as a significant constituent in the production of pharmaceuticals for human use, owing to its cellular structure that bears resemblance to that of human cells. E. coli is widely regarded as the preferred expression host within the biotechnology industry for the efficient and cost-effective production of proteins on a large scale. This preference is particularly evident in the context of non-glycosylated proteins, which are of significant interest for their therapeutic applications. This study aims to determine the efficacy of machine learning (ML) methods in classifying pharmacological datasets like Yeast and E. coli. As Yeast and E. coli both are imbalanced datasets by nature, to handle the imbalanced classification problem, the synthetic minority oversampling technique (SMOTE) is used in this study. The application of the combined feature subset selection (CFSS) approach has been also employed to find out the overall classification performance, but classification with CFFS has not been very promising. According to the findings, classification on SMOTE-oversampled data gives better results and from several performance and evaluation metrics, random forest achieves the best overall prediction performance for both Yeast and E. coli datasets with an accuracy of 87% and 97%, precision of 87% and 97%, recall of 87% and 98%, and f1 score of 85% and 98%, respectively. TI - Classification of imbalanced datasets utilizing the synthetic minority oversampling method in conjunction with several machine learning techniques JF - Iran Journal of Computer Science DO - 10.1007/s42044-024-00207-7 DA - 2025-03-01 UR - https://www.deepdyve.com/lp/springer-journals/classification-of-imbalanced-datasets-utilizing-the-synthetic-minority-uZtLbpW6kC SP - 51 EP - 68 VL - 8 IS - 1 DP - DeepDyve ER -