Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi

Web uygulamalarının kullanım oranındaki artış ile birlikte sayısı artan kötücül web siteleri ve saldırılar, son kullanıcıya ciddi zararlar vermektedir. Kişisel ve hassas bilgilerin çalınmasına yönelik bu saldırılardan biri Kimlik Avı saldırısıdır. Yayımlanan güvenlik raporlarında son yıllarda milyonlarca yeni kimlik avı sahteciliği yapan web sayfası tespit edildiği ifade edilmektedir. Böylesi kritik bir durumda bu web sayfalarının tespiti büyük önem arz etmektedir. Bu çalışmada, bir veri kümesi ile birlikte literatürde bulunan makine öğrenmesi sınıflandırma algoritmaları kullanılarak karşılaştırmalı analiz yapılmıştır. Analiz sonuçları, Kimlik Avı Sahteciliği çalışmalarında kullanılan sınıflandırma algoritmalarının hangi koşullarda tercih edilmesi gerektiği hakkında farklı parametreler bulunduğunu göstermektedir.

Comparative analysis of machine learning algorithms in detection of phishing websites

The increasing number of malicious web sites and attacks, along with the increase in the usage rate of web applications, cause severe damage to the end user. One of these attacks aimed at stealing personal and sensitive information is the Phishing Attack. In the published security reports, it is stated that in recent years there has been millions of web pages that have made new phishing scams. In such a critical situation, the identification of these web pages is of great importance. In this study, a comparative analysis was made on a mentioned dataset using machine learning classification algorithms in the literature. The results of the analysis show that the classification algorithms used have different parameters about which conditions should be preferred in the studies on Phishing Fraud.

___

  • McAfee Inc. "McAfee Labs Threats Report-February 2015". http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q4-2014.pdf (26.03.2016).
  • Symantec Corp. "Internet Security Threat Report-ISTR 20 April 2015". https://www4.symantec.com/mktginfo/whitepaper/ISTR/21347932_GA-internet-security-threat-report-volume-20-2015-social_v2.pdf (26.03.2016).
  • McAfee Inc. "McAfee Labs Threats Report-March 2016". http://www.mcafee.com/us/resources/reports/rp-quarterly-threats-mar-2016.pdf (01.05.2016).
  • UCI Machine Learning Repository. "Phishing Websites Dataset". https://archive.ics.uci.edu/ml/datasets/Phishing+Websites (26.03.2016).
  • Kazemian HB, Ahmed S. “Comparisons of machine learning techniques for detecting malicious webpages”. Expert Systems with Applications, 42(3), 1166-1177, 2015.
  • Li Y, Yang L, Ding J. “A minimum enclosing ball-based support vector machine approach for detection of phishing websites”. Optik, 127(1). 345-351. 2016.
  • Moghimi M, Varjani AY. “New rule-based phishing detection method”. Expert Systems with Applications,53, 231-242. 2016.
  • Nguyen LAT, To BL, Nguyen HK, Nguyen MH. "Detecting phishing web sites: A heuristic URL-based approach". 2013 International Conference on Advanced Technologies for Communications (ATC 2013), Hochiminh, Vietnamese, 16-18 October 2013.
  • Mohammad RM, Thabtah F, McCluskey L, Ieee. “An assessment of features related to phishing websites using an automated technique”. 2012 International Conference for Internet Technology and Secured Transactions, London, UK, 10-12 December 2012.
  • Phishtank. "Phishtank Archive". https://www.phishtank.com/phish_archive.php (01.05.2016).
  • Mohammad RM, Thabtah F, McCluskey L. “Predicting phishing websites based on self-structuring neural network”. Neural Computing & Applications, 25(2), 443-458. 2014.
  • Mohammad RM, Thabtah F, McCluskey L. “Intelligent rule-based phishing websites classification”. Iet Information Security, 8(3), 153-160, 2014.
  • Selvan K, Vanitha M. “A Machine Learning Approach for Detection of Phished Websites Using Neural Networks”. International Journal of Recent Technology and Engineering (IJRTE), 4(6), 19-23, 2016.
  • Singh P, Jain N, Maini A. "Investigating the effect of feature selection and dimensionality reduction on phishing website classification problem". 1st International Conference on Next Generation Computing Technologies (NGCT 2015), Dehradun, India, 4-5 September 2015.
  • The University of Maikato. "Weka Machine Learning Algorithms Collection Tool". http://www.cs.waikato.ac.nz/ml/weka/ (01.05.2016).
  • Quinlan JR. C4.5: Programs for Machine Learning. 1st ed. Massachusetts, USA, Morgan Kaufmann Publishers Inc., 1993.
  • Quinlan JR. “Induction of decision trees”. Machine Learning, 1(1), 81-106, 1986.
  • Cendrowska J. “PRISM: An algorithm for inducing modular rules”. International Journal of Man-Machine Studies. 27(4), 349-370, 1987.
  • Cohen WW. "Fast effective rule induction". Twelfth International Conference on Machine Learning (ML95), California, USA, 9-12 July 1995.
  • John GH, Langley P. "Estimating continuous distributions in Bayesian classifiers". Eleventh conference on Uncertainty in artificial intelligence, Montreal, Canada, 18-20 August 1995.
  • Dimitoglou G, Adams JA, Jim CM. “Comparison of the C4.5 and a Naïve Bayes classifier for the prediction of lung cancer survivability”. Journal of Computing, 4(8), 1-9, 2012.
  • Huang J, Lu J, Ling CX. "Comparing naive Bayes, decision trees, and SVM with AUC and accuracy". Third IEEE International Conference on Data Mining (ICDM), Melbourne, USA, 22 November 2003.
  • Aha DW, Kibler D, Albert MK. “Instance-based learning algorithms”. Machine learning, 6(1), 37-66. 1991.
  • Tan S. “An effective refinement strategy for KNN text classifier”. Expert Systems with Applications, 30(2), 290-298, 2006.
  • Breiman L. “Random forests”. Machine Learning. 45(1),5-32. 2001.
  • Davis J, Goadrich M. "The relationship between Precision-Recall and ROC curves". 23rd international Conference on Machine Learning, Pennsylvania, USA, 25-29 June 2006.
  • Fawcett T. “An introduction to ROC analysis”. Pattern recognition letters, 27(8), 861-874, 2006.
  • Ferri C, Hernández-Orallo J, Modroiu R. “An experimental comparison of performance measures for classification”. Pattern Recognition Letters, 30(1), 27-38. 2009.
  • Powers DM. “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation”. Journal of Machine Learning Technologies, 2(1), 37-63. 2011.