İstenmeyen E-postaların Tespiti için Kullanılan Yöntemlerin İncelenmesi

İstenmeyen elektronik postalar alıcıya rızası dışında gönderilen ve genellikle kötü niyetli veya tanıtım amaçlı olankişilerin başvurduğu bir yöntemdir. Elektronik postalar, kullanımının kolaylığı, maliyetlerinin ucuz olmasındandolayı propaganda, reklam, oltalama yapmak isteyen kişi veya topluluklar tarafından etkin bir biçimdekullanılmaktadır. Amaçlarını gerçekleştirmek isteyen kişi veya topluluklar hiç tanımadıkları e-posta hesaplarınagereksiz ve istenmeyen postalar gönderirler. Bu çalışmada, istenmeyen elektronik postaların filtrelenmesi içinliteratürde bulunan yöntemler incelenmiştir. Bu istenmeyen e-posta filtreleme yöntemleri temel olarak yapay zekâtabanlı olmayan ve yapay zekâ tabanlı olan şeklinde iki ana başlık altında incelenmiştir. Yapay zekâ tabanlıolmayan yöntemlerin istenmeyen e-posta tespitinde etkili sonuçlar verdiği ancak literatürde bu yöntemleriatlayabilen tekniklerin olduğu görülmektedir. İstenmeyen e-posta tespitinde yapay zekâ tabanlı makine öğrenmesialgoritmaları kullanan sistemlerin popülaritesinin arttığı ve araştırmaların bu yönde ivme kazandığı görülmektedir.Özellikle derin öğrenme yöntemleri yüksek performansları nedeniyle spam tespitinde tercih edilmeye başlamıştır.Literatürde klasik makine öğrenme yöntemlerinden olan Bayes, Destek Vektör Makinesi, Yapay Sinir Ağı,Rastgele Orman, Çok Katmanlı Algılayıcı, K-En Yakın Komşu gibi algoritmalarının kullanıldığı spam tespityöntemlerinde yüksek başarım sağladığı görülmektedir. Uzun Kısa Süreli Bellek ve Evrişimsel Sinir Ağıalgoritmalarını kullanan derin öğrenme temelli spam tespit yöntemlerinin başarım oranlarını daha da artırdığıfarklı veri kümeleri kullanılarak gösterilmiştir. Ayrıca spam tespit sistemlerinde bulunan açık problemler veTürkçe özelinde bu çalışmaların hangi aşamada olduğu da bu çalışmada irdelenmiştir ve çeşitli öneriler yapılmıştır.

Review of the Methods Used for the Detection of Spam

Spam e-mails are a method that is sent to the recipient without his consent and is generally used by people with malicious or promotional purposes. E-mails are actively used by people or communities who want to make propaganda, advertising, phishing because of their ease of use and low cost. People or communities who want to achieve their goals send spam to the e-mail accounts they never knew. In this study, the methods in the literature for filtering spam e-mails were examined. These spam filtering methods are mainly examined under two main headings: non-artificial intelligence-based and artificial intelligence-based. It is seen that non-artificial intelligence-based methods give effective results in detecting spam, but there are techniques in the literature that can bypass these methods. It is seen that the systems that use artificial intelligence-based machine learning algorithms in detecting spam have increased in popularity and research has gained momentum in this direction. Especially deep learning methods have been preferred for spam detection due to their high performance. In the literature, it is seen that it provides high performance in spam detection methods using algorithms such as Bayes, Support Vector Machine, Artificial Neural Network, Random Forest, Multilayer Perceptron, and K-Nearest Neighbour, which are classical machine learning methods. It has been demonstrated using different datasets that deep learning-based spam detection methods using Long Short Term Memory and Convolutional Neural Network algorithms further increase the performance rates. Besides, open problems found in spam detection systems and the stage of these studies in Turkish are also examined in this study and various suggestions have been made.

___

  • [1] Campaignmonitor. 2019. The Shocking Truth about How Many Emails Are Sent. https://www.campaignmonitor.com/blog/emailmarketing/2019/05/shocking-truth-about-how-manyemails-sent/ (Erişim Tarihi: 22.02.2020).
  • [2] Statista. 2020. Number of e-mail users worldwide from 2017 to 2023. https://www.statista.com/statistics/255080/numberof-e-mail-users-worldwide/ (Erişim Tarihi: 22.02.2020).
  • [3] Bauer, E. 2018. 15 Outrageous Email Spam Statistics that Still Ring True in 2018. https://www.propellercrm.com/blog/email-spamstatistics. (Erişim Tarihi: 22.02.2020).
  • [4] Statista. 2019. Global e-mail spam rate from 2012 to 2018. DÜMF Mühendislik Dergisi 11:3 (2020): pp. 977-987 986 https://www.statista.com/statistics/270899/global-email-spam-rate/ (Erişim Tarihi: 28.02.2020).
  • [5] Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. 2019. A Comprehensive Survey for Intelligent Spam Email Detection. IEEE Access, Access, IEEE, 7, 168261–168295. https://doi.org/10.1109/ACCESS.2019.2954791
  • [6] Ruef, M., & Young, E. 2019. Securing Email of your own Domain - SPF, DKIM and DMARC. https://doi.org/10.6084/m9.figshare.10145189
  • [7] Geerthik, S., & Anish, T. P. 2013. Filtering spam: Current trends and techniques. International Journal of Mechatronics, Electrical and Computer Technology Austrian E-Journals of Universal Scientific Organization, 3, 208-223.
  • [8] Gansterer, W., Ilger, M., Lechner, P., Neumayer, R., & Strauß, J. 2005. Anti-spam methods-state of the art. Institute of Distributed and Multimedia Systems, University of Vienna, 28, 29.
  • [9] Bajaj, K. S., Egbufor, F., & Pieprzyk, J. 2011. Critical analysis of spam prevention techniques. In 2011 Third International Workshop on Security and Communication Networks (IWSCN) (pp. 83-87). IEEE.
  • [10] Chiba, D., Akiyama, M., Yagi, T., Hato, K., Mori, T., & Goto, S. 2018. DomainChroma: Building actionable threat intelligence from malicious domain names. Computers & Security, 77, 138-161.
  • [11] Ramachandran, A., Feamster, N., & Vempala, S. 2007. Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM conference on Computer and communications security (pp. 342- 351).
  • [12] Lin, P. C., Lin, P. H., Chiou, P. R., & Liu, C. T. 2013. Detecting spamming activities by network monitoring with Bloom filters. In 2013 15th International Conference on Advanced Communications Technology (ICACT) (pp. 163- 168). IEEE.
  • [13] Khanna, S., Chaudhry, H., & Bindra, G. S. 2012. Inbound & Outbound Email Traffic Analysis and Its SPAM Impact. In 2012 Fourth International Conference on Computational Intelligence, Communication Systems and Networks (pp. 181- 186). IEEE.
  • [14] Laorden, C., Santos, I., Sanz, B., Alvarez, G., & Bringas, P. G. 2012. Word sense disambiguation for spam filtering. Electronic Commerce Research and Applications, 11(3), 290-298.
  • [15] Hu, Y., Guo, C., Ngai, E. W. T., Liu, M., & Chen, S. 2010. A scalable intelligent non-content-based spam-filtering framework. Expert Systems with Applications, 37(12), 8557-8565.
  • [16] Statista. 2019, Aralık. https://www.statista.com/statistics/263086/countriesof-origin-of-spam/. (Erişim Tarihi: 29.02.2020).
  • [17] Bradbury, D. 2014. Can we make email secure?. Network Security, 2014(3), 13-16.
  • [18] Greenstadt, R., & Kaminsky, M. 2002. Evolving Spam Filters Using Genetic Algorithms. Technical Report 3836. Massachusetts Institute of Technology.
  • [19] Saleh, A. J., Karim, A., Shanmugam, B., Azam, S., Kannoorpatti, K., Jonkman, M., & Boer, F. D. 2019. An intelligent spam detection model based on artificial immune system. Information, 10(6), 209.
  • [20] Shrivastava, J. N., & Bindu, M. H. 2013. E-mail classification using genetic algorithm with heuristic fitness function. International Journal of Computer Trends and Technology (IJCTT), 4(8), 2956-2961.
  • [21] Choudhary, M., & Dhaka, V. S. 2013. Automatic Emails classification using genetic algorithm. In Special Conference Issue: National Conference on Cloud Computing and Big Data (pp. 42-49).
  • [22] Zhao, W., & Zhang, Z. 2005. An email classification model based on rough set theory. In Proceedings of the 2005 International Conference on Active Media Technology, 2005.(AMT 2005). (pp. 403-408). IEEE.
  • [23] Altunyaprak, C. 2006. Bayes yöntemi kullanarak istenmeyen elektronik postaların filtrelenmesi. Yüksek Lisans Tezi, Muğla Üniversitesi Fen Bilimleri Enstitüsü
  • [24] Norte Sosa, J. 2010. Spam Classification Using Machine Learning Techniques-Sinespam (Master's thesis, Universitat Politècnica de Catalunya).
  • [25] Yumak, B. 2011. Elektronik postaların ayrıştırılmasında Naïve bayesian ve Bulanık Mantık yöntemlerinin karşılaştırılması. Yüksek Lisans Tezi, Gazi Üniversitesi Bilişim Enstitüsü, 97, Ankara.
  • [26] Awad, W. A., & ELseuofi, S. M. 2011. Machine learning methods for spam e-mail classification. International Journal of Computer Science & Information Technology (IJCSIT), 3(1), 173-184.
  • [27] Idris, I., & Abdulhamid, S. M. 2014. An improved AIS based e-mail classification technique for spam detection. arXiv preprint arXiv:1402.1242.
  • [28] Bhagyashri, G., Pratap, H., & Patil, D. Y. 2013. Auto E-mails classification using bayesian filter. International Journal of Advanced technology & Engineering Research, 3(4).
  • [29] Ateş, N. 2014. Destek vektör makineleri ve Gauss karışım modeli ile istenmeyen e-postaların tespiti. Yüksek Lisans Tezi, Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü, 57, Samsun.
  • [30] Ergin, S., Sora Gunal, E., Yigit, H., & Aydin, R. 2012. Turkish anti-spam filtering using binary and probabilistic models. Global Journal on Technology, 1.
  • [31] Sharma, A. K., Prajapat, S. K., & Aslam, M. 2014. A comparative study between naïve Bayes and neural network (MLP) classifier for spam email detection. Int. J. Comput. Appl.
  • [32] Karthika, R., & Visalakshi, P. 2015. A hybrid ACO based feature selection method for email spam classification. WSEAS Trans. Comput., 14, 171-177.
  • [33] Renuka, D. K., Visalakshi, P., & Sankar, T. 2015. Improving E-mail spam classification using ant colony optimization algorithm. Int. J. Comput. Appl, 22-26.
  • [34] Tuteja, S. K. and Bogiri, N. 2016. Email Spam filtering using BPNN classification algorithm. 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), IEEE, 915-919.
  • [35] Palanisamy, C., Kumaresan, T., & Varalakshmi, S. E. (2016). Combined techniques for detecting email spam using negative selection and particle swarm optimization. Int. J. Adv. Res. Trends Eng. Technol., 3.
  • [36] Zavvar, M., Rezaei, M., & Garavand, S. 2016. Email spam detection using combination of particle swarm optimization and artificial neural network and support vector machine. International Journal of Modern Education and Computer Science, 8(7), 68.
  • [37] Foqaha, M. A. M. 2016. Email spam classification using hybrid approach of RBF neural network and particle swarm optimization. International Journal of Network Security & Its Applications, 8(4), 17-28.
  • [38] Sharma, A., & Suryawanshi, A. 2016. A novel method for detecting spam email using KNN classification with spearman correlation as distance measure. International Journal of Computer Applications, 136(6), 28-35.
  • [39] Alkaht, I. J., & Al-Khatib, B. 2016. Filtering SPAM Using Several Stages Neural Networks. Int. Rev. Comp. Softw., 11, 2.
  • [40] Rajamohana, S. P., Umamaheswari, K., & Abirami, B. 2017. Adaptive binary flower pollination algorithm for feature selection in review spam detection. In 2017 International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT) (pp. 1-4). IEEE.
  • [41] Ott Myle, Choi Yejin, Cardie Claire, T. Hancock Jeffrey, "Finding deceptive opinion spam by any stretch of imagination", ACM Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309-319, 2011.
  • [42] Akinyelu, A. A., & Adewumi, A. O. 2014. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, 2014.
  • [43] Yıldız, A. 2017. Kurumsal e-posta sınıflandırma sistemi. Yüksek Lisans Tezi, Gazi Üniversitesi Fen Bilimleri Enstitüsü, 82, Ankara.
  • [44] Şahin, E. 2018. Makine öğrenme yöntemleri ve kelime kümesi tekniği ile istenmeyen e-posta / eposta sınıflaması. Yüksek Lisans Tezi, Hacettepe Üniversitesi Fen Bilimleri Enstitüsü, 60, Ankara.
  • [45] Kale, B. 2018. Veri madenciliği sınıflandırma algoritmaları ile e-posta önemliliğinin belirlenmesi. Yüksek Lisans Tezi, Çukuruova Üniversitesi Fen Bilimleri Enstitüsü, 120, Adana.
  • [46] Nazlı, N. 2018. Analysis of machine learning – based spam filtering techniques. Yüksek Lisans Tezi, Çankaya University The Graduate School of Natural and Applied Sciences, 79, Ankara.
  • [47] Al-Azzawi, F. 2018. Wrapper feature selection approach for spam e-mail filtering. Master Thesis, Erciyes University Graduate school of natural and applied science, Kayseri.
  • [48] Salihi, A. K. A. 2019. Spam detection by using wordvector learning algorithm in online social networks. Master Thesis, Fırat University Graduate school of natural and applied sciences institute, 46, Elazığ.
  • [49] Tyagi, A. 2016. Content Based Spam ClassificationA Deep Learning Approach (Master's thesis, Graduate Studies).
  • [50] Shang, E.-X. and Zhang, H.-G. 2016. Image spam classification based on convolutional neural network. 2016 International Conference on Machine Learning and Cybernetics (ICMLC), IEEE, 398-403.
  • [51] Roy, S. S., Sinha, A., Roy, R., Barna, C. and Samui, P. 2016. Spam Email Detection Using Deep Support Vector Machine, Support Vector Machine and Artificial Neural Network. International Workshop Soft Computing Applications, Springer, 162-174.
  • [52] Seth, S. and Biswas, S. 2017. Multimodal Spam Classification Using Deep Learning Techniques. 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), IEEE, 346-349.
  • [53] Yawen, W., Fan, Y. and Yanxi, W. 2018. Research of Email Classification based on Deep Neural Network. 2018 Second International Conference of Sensor Network and Computer Engineering (ICSNCE 2018), Atlantis Press.
  • [54] Ra, V., HBa, B. G., Ma, A. K., KPa, S., Poornachandran, P. and Verma, A. 2018. DeepAntiPhishNet: Applying deep neural networks for phishing email detection. Proc. 1st AntiPhishing Shared Pilot 4th ACM Int. Workshop Secur. Privacy Anal.(IWSPA), Tempe, AZ, USA, 1-11.
  • [55] Bagui, S., Nandi, D., Bagui, S. and White, R. J. 2019. Classifying Phishing Email Using Machine Learning and Deep Learning. 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), IEEE, 1-2.
  • [56] Yang, H., Liu, Q., Zhou, S. and Luo, Y. 2019. A Spam Filtering Method Based on Multi-Modal Fusion. Applied Sciences, 9:6, 1152.
  • [57] Jain, G., Sharma, M. and Agarwal, B. 2019. Optimizing semantic LSTM for spam detection. International Journal of Information Technology, 11:2, 239-250.
  • [58] Nagisetty, A. and Gupta, G. P. 2019. Framework for Detection of Malicious Activities in IoT Networks using Keras Deep Learning Library. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), IEEE, 633-637.
  • [59] Roy, P. K., Singh, J. P. and Banerjee, S. 2020. Deep learning to filter SMS Spam. Future Generation Computer Systems, 102, 524-5