Comparative Analysis of Machine Learning Algorithms for the Detection and Classification of Suspicious Emails

Shamsuddeen J. AHMAD; Saifullahi S. SADI; Muhammad M. AHMAD; Abdullahi D. UMAR; Shamsuddeen USMAN

doi:10.33003/8he28086

Comparative Analysis of Machine Learning Algorithms for the Detection and Classification of Suspicious Emails

Authors

Shamsuddeen J. AHMAD

Department of Computer Science, Kaduna polytechnic, Kaduna, Nigeria

Author
Saifullahi S. SADI

Department of Cyber Security, Nigerian Defence Academy, Kaduna, Nigeria

Author
Muhammad M. AHMAD

Department of Secure Computing, Kaduna State University, Zaria, Kaduna State, Nigeria

Author
Abdullahi D. UMAR

Department of Secure Computing, Kaduna State University, Zaria, Kaduna State, Nigeria

Author
Shamsuddeen USMAN

Department of Computer Science, Nuhu Bamalli Polytechnic, Zaria, Kaduna State, Nigeria

Author

DOI:

https://doi.org/10.33003/8he28086

Keywords:

Machine Learning, Random Forest, Support Vector Machine, Artificial Neural Network, Artificial Intelligence, Term Frequency-Inverse Document Frequency.

Abstract

The exponential growth of corporate email communications poses significant challenges for digital forensic investigations because manual analysis is slow, resource-intensive, and error-prone. This study compares three machine learning algorithms: Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN) for the detection and classification of suspicious emails. A publicly available dataset from the GitHub repository that comprises 60,000 instances was extracted. The methodology involved preprocessing the dataset by encoding categorical features and converting email body content into numerical representations using TF-IDF vectorisation, and SMOTE was used to balance the dataset. The dataset was then split into 80% (48,000 instances) for training and 20% (12,000 instances) for testing, and each classifier was trained and evaluated using performance metrics including accuracy, precision, recall, F1-score, and AUC. The result indicates that ANN achieved the highest performance (accuracy: 99.86%, AUC: 1.00), with balanced precision and recall across “Evidence” and “Non-Evidence” classes. Random Forest also performed strongly (accuracy: 99.92%, AUC: 1.00) with high interpretability, while SVM (accuracy: 98.92%, AUC: 1.00) showed strong precision but lower recall for “Non-Evidence” emails. ANN’s superior performance is attributed to its ability to model complex patterns and handle class imbalance effectively. The findings indicate that ANN demonstrates the highest performance in classifying suspicious emails, showing superior accuracy, efficiency, and scalability.

References

Cover Image

Downloads

FJET_12_78_71

Published

24-11-2025

Issue

Vol. 1 No. 2 (2025): December 2025

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

Comparative Analysis of Machine Learning Algorithms for the Detection and Classification of Suspicious Emails. (2025). FUDMA Journal of Engineering and Technology, 1(2), 735-745. https://doi.org/10.33003/8he28086

Download Citation

Comparative Analysis of Machine Learning Algorithms for the Detection and Classification of Suspicious Emails

How to Cite

Similar Articles