Access the full text.
Sign up today, get DeepDyve free for 14 days.
Yang Li, Q. Pan, Suhang Wang, Tao Yang, E. Cambria (2018)
A Generative Model for category text generationInf. Sci., 450
Sisi Liu, Ickjai Lee (2021)
Sequence encoding incorporated CNN model for Email document sentiment classificationAppl. Soft Comput., 102
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Anton Borg, Martin Boldt, Oliver Rosander, Jim Ahlstrand (2020)
E-mail classification with machine learning and word embeddings for improved customer supportNeural Computing and Applications
Shrawan Trivedi, S. Dey (2019)
A study of boosted evolutionary classifiers for detecting spamGlobal Knowledge, Memory and Communication
(2021)
Daily number of e-mails worldwide 2025 | Statista
Linkun Cai, Yu Song, Tao Liu, Kunli Zhang (2020)
A Hybrid BERT Model That Incorporates Label Semantics via Adjustive Attention for Multi-Label Text ClassificationIEEE Access, 8
Asma Baccouche, S. Ahmed, Daniel Sierra-Sosa, Adel Elmaghraby (2020)
Malicious Text Identification: Deep Learning from Public Comments and EmailsInf., 11
D. Do, N. Le (2020)
Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features.Genomics
Khondoker Islam, Md Islam, Md Amin (2020)
Sentiment analysis in Bengali via transfer learning using multi-lingual BERT2020 23rd International Conference on Computer and Information Technology (ICCIT)
Muhammad Roman, Abdul Shahid, M. Uddin, Qiaozhi Hua, Shazia Maqsood (2021)
Exploiting Contextual Word Embedding of Authorship and Title of Articles for Discovering Citation Intent ClassificationComplex., 2021
M. Asim, M. Ghani, Muhammad Ibrahim, Waqar Mahmood, A. Dengel, Sheraz Ahmed (2020)
Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classificationNeural Computing and Applications, 33
Michael Putong, Suharjito Suharjito (2020)
Classification Model of Contact Center Customers Emails Using Machine LearningAdvances in Science, Technology and Engineering Systems Journal, 5
Yongjun Wang, Jing Gao, Junjie Chen (2020)
Deep Learning Algorithm for Judicial Judgment Prediction Based on BERT2020 5th International Conference on Computing, Communication and Security (ICCCS)
Yukun Ma, Haiyun Peng, Tahir Khan, E. Cambria, A. Hussain (2018)
Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment AnalysisCognitive Computation, 10
Guibin Chen, Deheng Ye, Zhenchang Xing, Jieshan Chen, E. Cambria (2017)
Ensemble application of convolutional and recurrent neural networks for multi-label text categorization2017 International Joint Conference on Neural Networks (IJCNN)
S. Sumathi, Ganeshkumar Pugalendhi (2020)
Cognition based spam mail text analysis using combined approach of deep neural network classifier and random forestJournal of Ambient Intelligence and Humanized Computing
Shirui Wang, Wen'an Zhou, Chao Jiang (2019)
A survey of word embeddings based on deep learningComputing, 102
Wei Zhao, Haiyun Peng, Steffen Eger, E. Cambria, Min Yang (2019)
Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications
I. Chaturvedi, Y. Ong, I. Tsang, R. Welsch, E. Cambria (2016)
Learning word dependencies in text by means of a deep recurrent belief networkKnowl. Based Syst., 108
N. Le, Tuan-Tu Huynh (2019)
Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding RepresentationFrontiers in Physiology, 10
Iqbal Basyar, Adiwijaya, D. Murdiansyah (2020)
Email Spam Classification Using Gated Recurrent Unit and Long Short-Term MemoryJournal of Computer Science, 16
Tudor Boran, Muhamet Martinaj, M. Hossain (2020)
Authorship identification on limited samplingsComput. Secur., 97
R. Mohammad (2020)
A lifelong spam emails classification modelApplied Computing and Informatics
(2021)
The growing cyber threats for Digital India
U. Murugavel, R. Santhi (2020)
Detection of spam and threads identification in E-mail spam corpus using content based text analytics methodMaterials Today: Proceedings
Ashwin Ambalavanan, M. Devarakonda (2020)
Using the contextual language model BERT for multi-criteria classification of scientific articlesJournal of biomedical informatics
Awet Fesseha, Shengwu Xiong, Eshete Emiru, Moussa Diallo, Abdelghani Dahou (2021)
Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: TigrinyaInf., 12
Maryam Hina, Mohsin Ali, A. Javed, F. Ghabban, Liaqat Khan, Z. Jalil (2021)
SeFACED: Semantic-Based Forensic Analysis and Classification of E-Mail Data Using Deep LearningIEEE Access, 9
K. Apoorva, S. Sangeetha (2021)
Deep neural network and model-based clustering technique for forensic electronic mail author attributionSn Applied Sciences, 3
Beakcheol Jang, Myeonghwi Kim, Gaspard Harerimana, SangUk Kang, Jong Kim (2020)
Bi-LSTM Model to Increase Accuracy in Text Classification: Combining Word2vec CNN and Attention MechanismApplied Sciences
Dennis Moirangthem, Minho Lee (2021)
Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classificationExpert Syst. Appl., 165
Macedo Maia, J. Sales, A. Freitas, S. Handschuh, M. Endres (2021)
A Comparative Study of Deep Neural Network Models on Multi-Label Text Classification in Finance2021 IEEE 15th International Conference on Semantic Computing (ICSC)
E. Dada, Joseph Bassi, H. Chiroma, S. Abdulhamid, A. Adetunmbi, O. Ajibuwa (2019)
Machine learning for email spam filtering: review, approaches and open research problemsHeliyon, 5
Abhishek Dutta, G. Pooja, Neeraj Jain, Rama Panda, N. Nagwani (2020)
A Hybrid Deep Learning Approach for Stock Price Prediction
Jitendra Tembhurne, Tausif Diwan (2020)
Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networksMultimedia Tools and Applications, 80
A. Moreo, Andrea Esuli, F. Sebastiani (2019)
Word-class embeddings for multiclass text classificationData Mining and Knowledge Discovery, 35
Abdallah Ghourabi, Mahmood Mahmood, Qusay Alzubi (2020)
A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English MessagesFuture Internet, 12
Shakeel Ahmad, M. Asghar, F. Alotaibi, Sherafzal Khan (2020)
Classification of Poetry Text Into the Emotional States Using Deep Learning TechniqueIEEE Access, 8
Bilge Dedeturk, B. Akay (2020)
Spam filtering using a logistic regression model trained by an artificial bee colony algorithmAppl. Soft Comput., 91
P. Hájek, Aliaksandr Barushka, Michal Munk (2020)
Fake consumer review detection using deep neural networks integrating word embeddings and emotion miningNeural Computing and Applications
M. Zulqarnain, R. Ghazali, Yana Hassim, Muhammad Rehan (2020)
A comparative review on deep learning models for text classificationIndonesian Journal of Electrical Engineering and Computer Science, 19
(2018)
Index of/old/publiccorpus
Ishaani Priyadarshini, Chase Cotton (2021)
A novel LSTM–CNN–grid search-based deep neural network for sentiment analysisThe Journal of Supercomputing, 77
D. Dessí, D. Recupero, Harald Sack (2021)
An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual CommentsElectronics
Aakanksha Sharaff, N. Nagwani (2020)
ML-EC2: An Algorithm for Multi-Label Email Classification Using ClusteringInt. J. Web Based Learn. Teach. Technol., 15
Gwenaelle Sergio, Minho Lee (2019)
Stacked DeBERT: All Attention in Incomplete Data for Text ClassificationNeural networks : the official journal of the International Neural Network Society, 136
N. Le, Truong Hung, D. Do, Luu Lam, Luong Dang, Tuan-Tu Huynh (2021)
Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRIComputers in biology and medicine, 132
Guangxu Shan, Shiyao Xu, Li Yang, Shengbin Jia, Yang Xiang (2020)
Learn#: A Novel incremental learning method for text classificationExpert Syst. Appl., 147
(2018)
The Enron-Spam datasets
J. Pennington, R. Socher, C.D. Manning (2014)
GloVe: global vectors for word representation
L. Maltoudoglou, A. Paisios, Ladislav Lenc, J. Martínek, P. Král, H. Papadopoulos (2022)
Well-calibrated confidence measures for multi-label text classification with a large number of labelsPattern Recognit., 122
Sisi Liu, Kyungmi Lee, Ickjai Lee (2020)
Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentationKnowl. Based Syst., 197
Muhammad Roman, Abdul Shahid, Shafiullah Khan, A. Koubâa, Lisu Yu (2021)
Citation Intent Classification Using Word EmbeddingIEEE Access, 9
A. Zamir, H. Khan, W. Mehmood, Tassawar Iqbal, Abubakker Akram (2020)
A feature-centric spam email detection model using diverse supervised machine learning algorithmsElectron. Libr., 38
Devottam Gaurav, S. Tiwari, Ayush Goyal, N. Gandhi, A. Abraham (2020)
Machine intelligence-based algorithms for spam filtering on document labelingSoft Computing, 24
Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.
Data Technologies and Applications – Emerald Publishing
Published: Aug 23, 2022
Keywords: Email classification; Machine learning; Word embedding; GloVe; BERT; Deep learning
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.