Siber Saldırı Veri Kümelerinde Standart Sapmaya Dayalı Öznitelik Seçimi Kullanan Sınıflandırma Algoritmalarının Performanslarının Karşılaştırması

Denetimli makine öğrenimi teknikleri, geçmiş verilerden öğrenme yetenekleri nedeniyle finans, eğitim, sağlık, mühendislik vb. pek çok alanda yaygın olarak kullanılmaktadır. Ancak, veri kümesi çok boyutlu ise bu tür teknikler çok yavaş olabilir ve alakasız özellikler nedeniyle de sınıflandırma başarısını düşürebilir. Bu nedenle, bahsedilen sorunların üstesinden gelmek için öznitelik seçme veya nitelik azaltma teknikleri yaygın olarak kullanılmaktadır. Öte yandan, bilgi güvenliği hem insanlar hem de ağlar için çok önemlidir ve zaman kaybetmeksizin güvence altına alınması gerekir. Bu nedenle, sınıflandırma başarısını düşürmeden algoritmaları hızlandırabilen öznitelik seçim yaklaşımlarına ihtiyaç duyulmaktadır. Bu çalışmada, güvenlik veri kümeleri açısından standart sapmaya dayalı öznitelik seçimi kullanan en temel sınıflandırma algoritmalarının hem sınıflandırma başarılarını hem de çalışma zamanı performanslarını karşılaştırdık. Bu amaçla KDD Cup 99 ve Phishing Legitimate veri setlerine standart sapma tabanlı öznitelik seçimi uygulayarak en ilgili nitelikleri seçtik ve seçilen sınıflandırma algoritmalarını veri setlerinde uygulayarak sonuçları karşılaştırdık. Elde edilen sonuçlara göre, tüm algoritmaların sınıflandırma başarıları tatmin edici iken, Karar Ağacı (DT) diğerleri algoritmalara göre en iyisi olarak dikkat çekmiştir. Bununla birlikte, Karar Ağacı, k En Yakın Komşu ve Naïve Bayes (BN) tatmin edici düzeyde hızlıyken, Destek Vektör Makinesi (SVM) ve Yapay Sinir Ağları’nın (ANN veya NN) çok yavaş oldukları dikkat çekmiştir.

Comparison of Performance of Classification Algorithms Using Standard Deviation-based Feature Selection in Cyber Attack Datasets

Supervised machine learning techniques are commonly used in many areas like finance, education, healthcare, engineering, etc. because of their ability to learn from past data. However, such techniques can be very slow if the dataset is high-dimensional, and also irrelevant features may reduce classification success. Therefore, feature selection or feature reduction techniques are commonly used to overcome the mentioned issues. On the other hand, information security for both people and networks is crucial, and it must be secured without wasting the time. Hence, feature selection approaches that can make the algorithms faster without reducing the classification success are needed. In this study, we compare both the classification success and run-time performance of state-of-the-art classification algorithms using standard deviation-based feature selection in the aspect of security datasets. For this purpose, we applied standard deviation-based feature selection to KDD Cup 99 and Phishing Legitimate datasets for selecting the most relevant features, and then we run the selected classification algorithms on the datasets to compare the results. According to the obtained results, while the classification success of all algorithms is satisfying Decision Tree (DT) was the best one among others. On the other hand, while Decision Tree, k Nearest Neighbors, and Naïve Bayes (BN) were sufficiently fast, Support Vector Machine (SVM) and Artificial Neural Networks (ANN or NN) were too slow.

___

  • Abdullahi, M., Baashar, Y., Alhussian, H., Alwadain, A., Aziz, N., Capretz, L. F. and Abdulkadir, S. J. J. E. (2022). Detecting cybersecurity attacks in internet of things using artificial intelligence methods: A systematic literature review. 11(2), 198.
  • Ali, N., Neagu, D. and Trundle, P. J. S. A. S. (2019). Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. 1, 1-15.
  • Aljabri, M. and Mirza, S. (2022). Phishing Attacks Detection using Machine Learning and Deep Learning Models, 7th International Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 2022, pp. 175-180, doi: 10.1109/CDMA54072.2022.00034.
  • Almaiah, M. A., Al-Zahrani, A., Almomani, O. and Alhwaitat, A. K. (2021). Classification of cyber security threats on mobile devices and applications. In Artificial Intelligence and Blockchain for Future Cybersecurity Applications (pp. 107-123): Springer.
  • Ansari, M. F., Sharma, P. K. and Dash, B. J. P. (2022). Prevention of phishing attacks using AI-based Cybersecurity Awareness Training.
  • Bahaa, A., Abdelaziz, A., Sayed, A., Elfangary, L. and Fahmy, H. J. I. (2021). Monitoring real time security attacks for IoT systems using DevSecOps: a systematic literature review. 12(4), 154.
  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/A:1010933404324
  • Çetin, V. and Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 299-312.
  • Cheng, F., Cui, J., Wang Q., and Zhang, L. (2023). A Variable Granularity Search-Based Multiobjective Feature Selection Algorithm for High-Dimensional Data Classification, in IEEE Transactions on Evolutionary Computation, vol. 27, no. 2, pp. 266-280, April 2023, doi: 10.1109/TEVC.2022.3160458.
  • Deiana, A. M., Tran, N., Agar, J., Blott M.., Di Guglielmo G., Duarte, J. Harris, P., Hauck, S., Liu, M., Neubauer M., S., Ngadiuba J., Ogrenci-Memik, S., Pierini, M., Aarrestad, T., Bähr, S., Becker, J., Berthold A.-S,, Bonventre, R. J., Müller, Bravo, T. E., Diefenthaler M., Dong, Z., Fritzsche, N., Gholami, A., Govorkova, E., Guo, D., Hazelwood, K. J., Herwig, C., Khan, B., Kim, S., Klijnsma, T., Liu, Y., Lo, K. H., Nguyen, T., Pezzullo, G., Rasoulinezhad, S., Rivera, R, A., Scholberg, K., Selig, J., Sen, S., Strukov, D., Tang, W., Thais, S., Unger, K. L., Vilalta, R., von Krosigk, B., Wang, S. and Warburton, T. K. (2022). Applications and Techniques for Fast Machine Learning in Science. Front. Big Data 5:787421. doi: 10.3389/fdata.2022.787421
  • Di Mauro, M., Galatro, G., Fortino, G. and Liotta, A. (2022). Supervised feature selection techniques in network intrusion detection: A critical review, Engineering Applications of Artificial Intelligence, vol. 101, https://doi.org/10.1016/j.engappai.2021.104216.
  • Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • Eid, H. F., Hassanien, A. E., Kim, T. H., Banerjee, S. (2013). Linear correlation-based feature selection for network intrusion detection model. In Proceedings of the International Conference on Security of Information and Communication Networks 2013, Cairo, Egypt, 3–5 September 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 240–248.
  • Fürnkranz, J. (2017). Decision Tree. In C. Sammut and G. I. Webb (Eds.), Encyclopedia of Machine Learning and Data Mining (pp. 330-335). Boston, MA: Springer US.
  • Heidari, A., Jafari Navimipour, N., Unal, M., Toumaj, S. J. N. C. and Applications. (2022). Machine learning applications for COVID-19 outbreak management. 34(18), 15313-15348.
  • Jain, A. K., Mao, J. and Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. J Computer, 29(3), 31-44. doi:10.1109/2.485891
  • Khaire, U. M., Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review, Journal of King Saud University - Computer and Information Sciences, 34(4), https://doi.org/10.1016/j.jksuci.2019.06.012.
  • Kira, K. and Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. Paper presented at the Proceedings of the tenth national conference on Artificial intelligence, San Jose, California.
  • Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273-324. doi:https://doi.org/10.1016/S0004-3702(97)00043-X
  • Kushwaha, P., Buckchash, H. and Raman, B. (2017) Anomaly based intrusion detection using filter based feature selection on KDD-CUP 99. In Proceedings of the TENCON 2017—2017 IEEE Region 10 Conference, Penang, Malaysia, 5–8 November 2017; pp. 839–844.
  • Lee, C. S., Cheang, P. Y. S. and Moslehpour, M. J. A. i. D. S. (2022). Predictive analytics in business analytics: decision tree. Advances in Decision Sciences, 26(1), 1-29.
  • Li, Y., Fang, B. X., Chen, Y., Guo, L. (2006). A lightweight intrusion detection model based on feature selection and maximum entropy model. In Proceedings of the 2006 International Conference on Communication Technology, Guilin, China, 27–30 November 2006; pp. 1–4.
  • Lyu Y, Feng Y and Sakurai K. A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack Detection. Information. 2023; 14(3):191. https://doi.org/10.3390/info14030191
  • Maheswari, V. U., Aluvalu, R. and Mudrakola, S. (2022). An integrated number plate recognition system through images using threshold-based methods and KNN. Paper presented at the 2022 International Conference on Decision Aid Sciences and Applications (DASA).
  • Malik, N. U. R., Abu Bakar, S. A. R. and Sheikh, U. U. (2022). Multiview human action recognition system based on OpenPose and KNN classifier. Paper presented at the Proceedings of the 11th International Conference on Robotics, Vision, Signal Processing and Power Applications: Enhancing Research and Innovation through the Fourth Industrial Revolution.
  • Manevitz, L. M., and Malik Y. (2001). One-class svms for document classification. J. Mach. Learn. Res. 2, 139–154.
  • Mohammadi, S., Desai, V., Karimipour, H. (2018). Multivariate mutual information-based feature selection for cyber intrusion detection. In Proceedings of the 2018 IEEE Electrical Power and Energy Conference (EPEC), Toronto, ON, Canada, 10–11 October 2018; pp. 1–6.
  • Nguyen, H., Franke K. and Petrovic, S. (2010). Improving Effectiveness of Intrusion Detection by Correlation Feature Selection, 2010 International Conference on Availability, Reliability and Security, Krakow, Poland, 2010, pp. 17-24, doi: 10.1109/ARES.2010.70.
  • Ojewumi, T. O., Ogunleye, G., Oguntunde, B., Folorunsho, O., Fashoto, S. and Ogbu, N. J. S. A. (2022). Performance evaluation of machine learning tools for detection of phishing attacks on web pages. 16, e01165.
  • Patil, S. and Patil, Y. (2022). Face Expression Recognition Using SVM and KNN Classifier with HOG Features. In Applied Computational Technologies: Proceedings of ICCET 2022 (pp. 416-424): Springer.
  • Rivera-Lopez, R., Canul-Reich, J., Mezura-Montes, E., Cruz-Chávez, M. A. J. S. and Computation, E. (2022). Induction of decision trees as classification models through metaheuristics. 69, 101006.
  • Russell, S. J. (2010). Artificial intelligence a modern approach: Pearson Education, Inc.
  • Shabudin, S., Samsiah, N., Akram, K. and Aliff, M. (2020). Feature Selection for Phishing Website Classification. International Journal of Advanced Computer Science and Applications, 11.
  • Shahbaz, M.B., Wang, X., Behnad, A., Samarabandu, J. (2016). On efficiency enhancement of the correlation-based feature selection for intrusion detection systems. In Proceedings of the 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 13–15 October 2016; pp. 1–7
  • Şenol, A. (2022a). Comparison of Feature Selection Methods in the Aspect of Phishing Attacks. Paper presented at the International Conference on Engineering Technologies, ICENTE'22, Konya.
  • Şenol, A. (2022b). Standard Deviation-Based Centroid Initialization For K-Means. Paper presented at the 3. International Anatolian Scientific Research Congress, Kayseri.
  • Şenol, A. , Canbay, Y. and Kaya, M. (2021). Trends in Outbreak Detection in Early Stage by Using Machine Learning Approaches. Bilişim Teknolojileri Dergisi, 14(4), 355-366.
  • Tan, C. L. (2018). Phishing Dataset for Machine Learning: Feature Evaluation.
  • Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
  • Uma, M. and Padmavathi, G. (2013) A Survey on Various Cyber Attacks and Their Classification. International Journal of Network Security, 15, 390-396..
  • Wahba, Y., ElSalamouny, E., ElTaweel, G. (2015). Improving the performance of multi-class intrusion detection systems using feature reduction. arXiv:1507.06692
  • Wang, W., Du, X., Wang, N. (2019). Building a cloud IDS using an efficient feature selection method and SVM. IEEE Access 2018, 7, 1345–1354.
  • Yousefpour, A., Ibrahim, R., Abdull Hamed, H. N. and Hajmohammadi, M. S. (2014). Feature reduction using standard deviation with different subsets selection in sentiment analysis. Paper presented at the Intelligent Information and Database Systems: 6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, April 7-9, 2014, Proceedings, Part II 6.
  • Zhou, H., Wang, X. and Zhu, R. J. A. I. (2022). Feature selection based on mutual information with correlation coefficient. 1-18.
International Journal of Pure and Applied Sciences-Cover
  • ISSN: 2149-0910
  • Yayın Aralığı: Yılda 2 Sayı
  • Başlangıç: 2015
  • Yayıncı: Munzur Üniersitesi
Sayıdaki Diğer Makaleler

Yapay Sinir Ağı Kullanımı ile Rüzgar Enerjisi Üretimi Tahmini

Ayhan TOKMAK, İlyas ATALAY, Övgü Ceyda YELGEL

GloVe Kelime Gömmeleri ve Sinir Ağları ile Haber Metinlerinin Sınıflandırılması

Hulya HARK, Meral KARAKURT, Cengiz HARK, Ali KARCİ

Tek Fazlı Z-Kaynaklı Matris Dönüştürücü Modellenmesi ve Simülasyonu

Zeynep Bala DURANAY, Hanifi GÜLDEMİR

Çimento Hamuruyla Kaplanmış Pomza Agregalarının Su Emme ve Darbe Dayanımı Performanslarının İncelenmesi

Alper Tunga ÖZGÜLER, Turgay GÖNCÜOĞLU, Mehmet EMİROĞLU

Defne Yaprağı Esansiyel Yağının +4°C'de Muhafaza Edilen Gökkuşağı Alabalığı (Oncorhynchus mykiss Walbaum,1792) Filetolarının Raf Ömrü Üzerine Koruyucu Etkisi

Nermin KARATON KUZGUN

Bulanık Anahtarlama Algoritması ile DTC Kontrollü Asenkron Makine için İyileştirilmiş Tork ve Hız Performansları

Göksu GÖREL, Wahib HİLOUAN MOHAMED

Kirlenmiş Bir Alanda Olgun ve Olgunlaşmamış Domateslerdeki Kritik Hammadde Grubundan Toksik Elementler: Birikim ve Potansiyel Sağlık Riski Değerlendirmesi

Murat TOPAL, Emine Işıl ARSLAN TOPAL, Erdal ÖBEK

Siber Saldırı Veri Kümelerinde Standart Sapmaya Dayalı Öznitelik Seçimi Kullanan Sınıflandırma Algoritmalarının Performanslarının Karşılaştırması

Ali ŞENOL

Yanal Yüzeylerinden Çentik Kanal Açılan Sandviç Kompozitlerin Eksenel Darbe Sonrası Mekanik Özelliklerinin İncelenmesi

Sermet DEMİR, Uğur KEMİKLİOĞLU

Bulanık Topolojik Uzayların Toplamları Üzerine

Arife ATAY, Farah ALŞİBLİ