Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language

In this study, we have provided an alternative solution to spam and legitimate email classification problem. The different deep learning architectures are applied on two feature selection methods, including the Mutual Information (MI) and Weighted Mutual Information (WMI). Firstly, feature selection methods including WMI and MI are applied to reduce number of selected terms. Secondly, the feature vectors are constructed with concept of the bag-of-words (BoW) model. Finally, the performance of system is analyzed with using Artificial Neural Network (ANN), Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BILSTM) models. After experimental simulations, we have observed that there is a competition between detection results of using WMI and MI when commented with accuracy rates for the agglutinative language, namely Turkish. The experimental scores show that the LSTM and BILSTM give 100% accuracy scores when combined with MI or WMI, for spam and legitimate emails. However, for particular cross validation, the performance WMI is higher than MI features in terms e-mail grouping. It turns out that WMI and MI with deep learning architectures seem more robust to spam email detection when considering the high detection scores.

___

[1] A. K. Uysal, S. Gunal, S. Ergin, and E. S. Gunal, "A novel framework for SMS spam filtering," in 2012 International Symposium on Innovations in Intelligent Systems and Applications, 2012, pp. 1-4.

[2] L. Özgür, T. Güngör, and F. Gürgen, "Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish," Pattern Recognition Letters, vol. 25, no. 16, pp. 1819-1831, 2004.

[3] A. Uysal, S. Gunal, S. Ergin, and E. Sora Gunal, "The Impact of Feature Extraction and Selection on SMS Spam Filtering," Elektronika ir Elektrotechnika, vol. 19, no. 5, pp. 67-72, 2012.

[4] usa.kaspersky.com. (2020). Spam and Phishing Statistics Report Q1-2014, [Online]. Available: https://usa.kaspersky.com/resource-center/threats/spam-statistics-report-q1-2014, Accessed: Dec. 11, 2020.

[5] itgovarnance.eu. (2020). Kaspersky records 130 million phishing attacks in Q2 2019, [Online]. Available: https://www.itgovernance.eu/, Accessed: Dec. 11, 2020.

[6] S. Gunal, "Hybrid feature selection for text classification," Turkish Journal of Electrical Engineering Computer Sciences, vol. 20, no. 2, pp. 1296-1311, 2012.

[7] S. Ergin and S. Isik, "The assessment of feature selection methods on agglutinative language for spam email detection: A special case for Turkish," in Innovations in Intelligent Systems and Applications (INISTA) Proceedings, 2014 IEEE International Symposium on, 2014, pp. 122-125.

[8] S. Ergin and S. Isik, "The investigation on the effect of feature vector dimension for spam email detection with a new framework," in Information Systems and Technologies (CISTI), 2014 9th Iberian Conference on, 2014, pp. 1-4.

[9] W. Fang, H. Luo, S. Xu, P. E. Love, Z. Lu, and C. Ye,"Automated text classification of near-misses from safety reports: An improved deep learning approach," Advanced Engineering Informatics, vol. 44, p. 101060, 2020.

[10] A. Elnagar, R. Al-Debsi, and O. Einea, "Arabic text classification using deep learning models," Information Processing Management, vol. 57, no. 1, p. 102121, 2020.

[11] A. Abdi, S. M. Shamsuddin, S. Hasan, and J. Piran, "Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion," Information Processing Management, vol. 56, no. 4, pp. 1245-1259, 2019.

[12] P. K. Roy, J. P. Singh, and S. Banerjee, "Deep learning to filter SMS spam," Future Generation Computer Systems, vol. 102, pp. 524-533, 2020.

[13] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.

[14] M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.

[15] N. Chirawichitchai, P. Sa-nguansat, and P. Meesad, "A Comparative Study on Feature Weight in Thai Document Categorization Framework," in IICS, 2010, pp. 257-266.

[16] M. Lan, C.-L. Tan, H.-B. Low, and S.-Y. Sung, "A comprehensive comparative study on term weighting schemes for text categorization with support vector machines," in Special interest tracks and posters of the 14th international conference on World Wide Web, 2005, pp. 1032-1033.

[17] Z.-H. Deng, S.-W. Tang, D.-Q. Yang, M. Z. L.-Y. Li, and K.-Q. Xie, "A comparative study on feature weight in text categorization," in Advanced Web Technologies and Applications: Springer, 2004, pp. 588-597.

[18] J. Chen, H. Huang, S. Tian, and Y. Qu, "Feature selection for text classification with Naïve Bayes," Expert Systems with Applications, vol. 36, no. 3, pp. 5432-5435, 2009.

[19] D. Mladenic, "Machine Learning on non-homogeneous, distributed text data," Ljubljana, Slovenia, Faculty of Computer and Information Science, University of Ljubljana, Diss, vol. 3, no. 3.1, p. 2, 1998.

[20] Z. Zheng, X. Wu, and R. Srihari, "Feature selection for text categorization on imbalanced data," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80-89, 2004.

[21] M. A. Turk and A. P. Pentland, "Face recognition using eigenfaces," in Computer Vision and Pattern Recognition, 1991. Proceedings CVPR'91., IEEE Computer Society Conference on, 1991, pp. 586-591.

[22] M. E. Wall, A. Rechtsteiner, and L. M. Rocha, "Singular value decomposition and principal component analysis," in A practical approach to microarray data analysis: Springer, 2003, pp. 91-109.

[23] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, "Face recognition by independent component analysis," Neural Networks, IEEE Transactions on, vol. 13, no. 6, pp. 1450- 1464, 2002.

[24] J. W. Sammon, "A nonlinear mapping for data structure analysis," IEEE Transactions on computers, vol. 18, no. 5, pp. 401-409, 1969.

[25] C. M. Bishop, "Neural networks for pattern recognition," 1995.

[26] P. Indyk and R. Motwani, "Approximate nearest neighbors: towards removing the curse of dimensionality," in Proceedings of the thirtieth annual ACM symposium on Theory of computing, 1998, pp. 604-613: ACM.

[27] A. Jain and D. Zongker, "Feature selection: Evaluation, application, and small sample performance," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, no. 2, pp. 153-158, 1997.

[28] I. V. Oseledets and E. E. Tyrtyshnikov, "Breaking the curse of dimensionality, or how to use SVD in many dimensions," SIAM Journal on Scientific Computing, vol. 31, no. 5, pp. 3744-3759, 2009.

[29] D. Zongker and A. Jain, "Algorithms for feature selection: An evaluation," in Pattern Recognition, 1996., Proceedings of the 13th International Conference on, 1996, vol. 2, pp. 18- 22.