Efficient Email Spam Detection Using Machine Learning Techniques: A Comparative Analysis of Classification Models | ||||
International Journal of Intelligent Computing and Information Sciences | ||||
Volume 24, Issue 4, December 2024, Page 1-15 PDF (657.46 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/ijicis.2024.321043.1355 | ||||
![]() | ||||
Authors | ||||
Md Nurul Raihen ![]() ![]() | ||||
1Department of Mathematics and Computer Science, Fontbonne University, Saint Louis, MO, USA | ||||
2Department of Statistics, Western Michigan University, Kalamazoo, MI, USA | ||||
3Institute for Data science and Informatics, University of Missouri, Columbia, MO, USA | ||||
4Department of Mathematics, University of Houston, Texas, USA | ||||
Abstract | ||||
Spam emails pose a significant challenge to digital communication by compromising user privacy and security. This study investigates the performance of classical machine learning and modern deep learning models for email spam detection using a publicly available Kaggle dataset consisting of over 5,000 emails. Among machine learning classifiers, the Support Vector Machine (SVM) demonstrated better performance, achieving an accuracy of 99.0\% and an F1-score of 0.97, underscoring its robustness and capability to effectively generalize across diverse data. Logistic Regression also exhibited competitive results with an accuracy of 98.4\%, complemented by its interpretability, enabling a detailed analysis of feature importance. Additionally, transformer-based deep learning models, including BERT, DistilBERT, RoBERTa, and XLNet, were evaluated. BERT achieved the highest accuracy among these models at 98.8\%, with an F1-score of 0.97, showcasing its ability to capture contextual nuances in text. Comprehensive evaluation metrics such as precision, recall, and specificity were employed to ensure a holistic comparison of model performance. To facilitate practical deployment, a user-friendly interface was developed for real-time email classification. These findings highlight the efficacy of both classical and modern approaches to spam detection, offering valuable insights for advancing email security and enabling the development of scalable, real-time applications. | ||||
Keywords | ||||
Email Spam; Machine Learning; Deep Learning; Text Classification; Spam Filter | ||||
Statistics Article View: 257 PDF Download: 144 |
||||