Exploring the effectiveness of word embedding based deep learning model for improving email classification

Deepak Suresh Asudani; Naresh Kumar Nagwani; Pradeep Singh

doi:10.1108/dta-07-2021-0191

Loading next page...

References (57)

Yang Li, Q. Pan, Suhang Wang, Tao Yang, E. Cambria (2018)
A Generative Model for category text generation
Inf. Sci., 450
Sisi Liu, Ickjai Lee (2021)
Sequence encoding incorporated CNN model for Email document sentiment classification
Appl. Soft Comput., 102
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Anton Borg, Martin Boldt, Oliver Rosander, Jim Ahlstrand (2020)
E-mail classification with machine learning and word embeddings for improved customer support
Neural Computing and Applications
Shrawan Trivedi, S. Dey (2019)
A study of boosted evolutionary classifiers for detecting spam
Global Knowledge, Memory and Communication
(2021)
Daily number of e-mails worldwide 2025 | Statista
Linkun Cai, Yu Song, Tao Liu, Kunli Zhang (2020)
A Hybrid BERT Model That Incorporates Label Semantics via Adjustive Attention for Multi-Label Text Classification
IEEE Access, 8
Asma Baccouche, S. Ahmed, Daniel Sierra-Sosa, Adel Elmaghraby (2020)
Malicious Text Identification: Deep Learning from Public Comments and Emails
Inf., 11
D. Do, N. Le (2020)
Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features.
Genomics
Khondoker Islam, Md Islam, Md Amin (2020)
Sentiment analysis in Bengali via transfer learning using multi-lingual BERT
2020 23rd International Conference on Computer and Information Technology (ICCIT)
Muhammad Roman, Abdul Shahid, M. Uddin, Qiaozhi Hua, Shazia Maqsood (2021)
Exploiting Contextual Word Embedding of Authorship and Title of Articles for Discovering Citation Intent Classification
Complex., 2021
M. Asim, M. Ghani, Muhammad Ibrahim, Waqar Mahmood, A. Dengel, Sheraz Ahmed (2020)
Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification
Neural Computing and Applications, 33
Michael Putong, Suharjito Suharjito (2020)
Classification Model of Contact Center Customers Emails Using Machine Learning
Advances in Science, Technology and Engineering Systems Journal, 5
Yongjun Wang, Jing Gao, Junjie Chen (2020)
Deep Learning Algorithm for Judicial Judgment Prediction Based on BERT
2020 5th International Conference on Computing, Communication and Security (ICCCS)
Yukun Ma, Haiyun Peng, Tahir Khan, E. Cambria, A. Hussain (2018)
Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment Analysis
Cognitive Computation, 10
Guibin Chen, Deheng Ye, Zhenchang Xing, Jieshan Chen, E. Cambria (2017)
Ensemble application of convolutional and recurrent neural networks for multi-label text categorization
2017 International Joint Conference on Neural Networks (IJCNN)
S. Sumathi, Ganeshkumar Pugalendhi (2020)
Cognition based spam mail text analysis using combined approach of deep neural network classifier and random forest
Journal of Ambient Intelligence and Humanized Computing
Shirui Wang, Wen'an Zhou, Chao Jiang (2019)
A survey of word embeddings based on deep learning
Computing, 102
Wei Zhao, Haiyun Peng, Steffen Eger, E. Cambria, Min Yang (2019)
Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications
I. Chaturvedi, Y. Ong, I. Tsang, R. Welsch, E. Cambria (2016)
Learning word dependencies in text by means of a deep recurrent belief network
Knowl. Based Syst., 108
N. Le, Tuan-Tu Huynh (2019)
Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
Frontiers in Physiology, 10
Iqbal Basyar, Adiwijaya, D. Murdiansyah (2020)
Email Spam Classification Using Gated Recurrent Unit and Long Short-Term Memory
Journal of Computer Science, 16
Tudor Boran, Muhamet Martinaj, M. Hossain (2020)
Authorship identification on limited samplings
Comput. Secur., 97
R. Mohammad (2020)
A lifelong spam emails classification model
Applied Computing and Informatics
(2021)
The growing cyber threats for Digital India
U. Murugavel, R. Santhi (2020)
Detection of spam and threads identification in E-mail spam corpus using content based text analytics method
Materials Today: Proceedings
Ashwin Ambalavanan, M. Devarakonda (2020)
Using the contextual language model BERT for multi-criteria classification of scientific articles
Journal of biomedical informatics
Awet Fesseha, Shengwu Xiong, Eshete Emiru, Moussa Diallo, Abdelghani Dahou (2021)
Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya
Inf., 12
Maryam Hina, Mohsin Ali, A. Javed, F. Ghabban, Liaqat Khan, Z. Jalil (2021)
SeFACED: Semantic-Based Forensic Analysis and Classification of E-Mail Data Using Deep Learning
IEEE Access, 9
K. Apoorva, S. Sangeetha (2021)
Deep neural network and model-based clustering technique for forensic electronic mail author attribution
Sn Applied Sciences, 3
Beakcheol Jang, Myeonghwi Kim, Gaspard Harerimana, SangUk Kang, Jong Kim (2020)
Bi-LSTM Model to Increase Accuracy in Text Classification: Combining Word2vec CNN and Attention Mechanism
Applied Sciences
Dennis Moirangthem, Minho Lee (2021)
Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classification
Expert Syst. Appl., 165
Macedo Maia, J. Sales, A. Freitas, S. Handschuh, M. Endres (2021)
A Comparative Study of Deep Neural Network Models on Multi-Label Text Classification in Finance
2021 IEEE 15th International Conference on Semantic Computing (ICSC)
10.3115/v1/D14-1162
E. Dada, Joseph Bassi, H. Chiroma, S. Abdulhamid, A. Adetunmbi, O. Ajibuwa (2019)
Machine learning for email spam filtering: review, approaches and open research problems
Heliyon, 5
Abhishek Dutta, G. Pooja, Neeraj Jain, Rama Panda, N. Nagwani (2020)
A Hybrid Deep Learning Approach for Stock Price Prediction
Jitendra Tembhurne, Tausif Diwan (2020)
Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks
Multimedia Tools and Applications, 80
A. Moreo, Andrea Esuli, F. Sebastiani (2019)
Word-class embeddings for multiclass text classification
Data Mining and Knowledge Discovery, 35
Abdallah Ghourabi, Mahmood Mahmood, Qusay Alzubi (2020)
A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages
Future Internet, 12
Shakeel Ahmad, M. Asghar, F. Alotaibi, Sherafzal Khan (2020)
Classification of Poetry Text Into the Emotional States Using Deep Learning Technique
IEEE Access, 8
Bilge Dedeturk, B. Akay (2020)
Spam filtering using a logistic regression model trained by an artificial bee colony algorithm
Appl. Soft Comput., 91
P. Hájek, Aliaksandr Barushka, Michal Munk (2020)
Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining
Neural Computing and Applications
M. Zulqarnain, R. Ghazali, Yana Hassim, Muhammad Rehan (2020)
A comparative review on deep learning models for text classification
Indonesian Journal of Electrical Engineering and Computer Science, 19
(2018)
Index of/old/publiccorpus
Ishaani Priyadarshini, Chase Cotton (2021)
A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis
The Journal of Supercomputing, 77
D. Dessí, D. Recupero, Harald Sack (2021)
An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments
Electronics
Aakanksha Sharaff, N. Nagwani (2020)
ML-EC2: An Algorithm for Multi-Label Email Classification Using Clustering
Int. J. Web Based Learn. Teach. Technol., 15
Gwenaelle Sergio, Minho Lee (2019)
Stacked DeBERT: All Attention in Incomplete Data for Text Classification
Neural networks : the official journal of the International Neural Network Society, 136
N. Le, Truong Hung, D. Do, Luu Lam, Luong Dang, Tuan-Tu Huynh (2021)
Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRI
Computers in biology and medicine, 132
Guangxu Shan, Shiyao Xu, Li Yang, Shengbin Jia, Yang Xiang (2020)
Learn#: A Novel incremental learning method for text classification
Expert Syst. Appl., 147
(2018)
The Enron-Spam datasets
J. Pennington, R. Socher, C.D. Manning (2014)
GloVe: global vectors for word representation
L. Maltoudoglou, A. Paisios, Ladislav Lenc, J. Martínek, P. Král, H. Papadopoulos (2022)
Well-calibrated confidence measures for multi-label text classification with a large number of labels
Pattern Recognit., 122
Sisi Liu, Kyungmi Lee, Ickjai Lee (2020)
Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentation
Knowl. Based Syst., 197
Muhammad Roman, Abdul Shahid, Shafiullah Khan, A. Koubâa, Lisu Yu (2021)
Citation Intent Classification Using Word Embedding
IEEE Access, 9
A. Zamir, H. Khan, W. Mehmood, Tassawar Iqbal, Abubakker Akram (2020)
A feature-centric spam email detection model using diverse supervised machine learning algorithms
Electron. Libr., 38
Devottam Gaurav, S. Tiwari, Ayush Goyal, N. Gandhi, A. Abraham (2020)
Machine intelligence-based algorithms for spam filtering on document labeling
Soft Computing, 24

Publisher: Emerald Publishing
Copyright: © Emerald Publishing Limited
ISSN: 2514-9288
DOI: 10.1108/dta-07-2021-0191
Publisher site: See Article on Publisher Site

Abstract

Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Journal

Data Technologies and Applications – Emerald Publishing

Published: Aug 23, 2022

Keywords: Email classification; Machine learning; Word embedding; GloVe; BERT; Deep learning

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Exploring the effectiveness of word embedding based deep learning model for improving email classification

Exploring the effectiveness of word embedding based deep learning model for improving email classification

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Exploring the effectiveness of word embedding based deep learning model for improving email classification

Exploring the effectiveness of word embedding based deep learning model for improving email classification

References (57)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies