Murat KOCA, İsa AVCI, Mohammed Abdulkareem Shakir AL-HAYANİ

Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm

The financial losses of vulnerable and insecure websites are increasing day by day. The proposed system in this research presents a strategy based on factor analysis of website categories and accurate identification of unknown information to classify safe and dangerous websites and protect users from the previous one. Probability calculations based on Naive Bayes and other powerful approaches are used throughout the website classification procedure to evaluate and train the website classification model. According to our study, the Naive Bayes approach was benign and showed successful results compared to other tests. This strategy is best optimized to solve the problem of distinguishing secure websites from unsafe ones. The vulnerability data categorization training model included in this datasheet had a better degree of precision. In this study, the best accuracy probability of 96% was achieved in Naive Bayes' NSL-KDD data set categorization

Keywords:

: HTML, Malicious, Naive Bayes Machine learning, URL, Neural Network,

PDF

___

M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
A. Sharma and A. Thakral, “Malicious URL classification using machine learning algorithms and comparative analysis,” Advances in Intelligent Systems and Computing, vol. 1090, pp. 791–799, 2020, doi: 10.1007/978-981-15-1480-7_73/COVER.
K. U. Santoshi, S. S. Bhavya, Y. B. Sri, and B. Venkateswarlu, “Twitter Spam Detection Using Naïve Bayes Classifier,” Proceedings of the 6th International Conference on Inventive Computation Technologies, ICICT 2021, pp. 773–777, Jan. 2021, doi: 10.1109/ICICT50816.2021.9358579.
T. Islam, S. Latif, and N. Ahmed, “Using Social Networks to Detect Malicious Bangla Text Content,” 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019, ICASERT 2019, May 2019, doi: 10.1109/ICASERT.2019.8934841.
A. Moruff Oyelakin, O. Akinyemi Moruff, A. Olasunkanmi Maruf, and A. Tosho, “Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs Machine Learning Techniques in building Predictive Models for COVID-19 View project Investigation of MANET security protocols and optimisation View project Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs”, Accessed: Jan. 05, 2023. [Online]. Available: https://www.researchgate.net/publication/345161822
Maciej Serda et al., “Synteza i aktywność biologiczna nowych analogów tiosemikarbazonowych chelatorów żelaza,” Uniwersytet śląski, vol. 7, no. 1, pp. 343–354, 2013, doi: 10.2/JQUERY.MIN.JS.
T. Wu, Y. Xi, M. Wang, and Z. Zhao, “Classification of Malicious URLs by CNN Model Based on Genetic Algorithm,” Applied Sciences 2022, Vol. 12, Page 12030, vol. 12, no. 23, p. 12030, Nov. 2022, doi: 10.3390/APP122312030.
R. Rajalakshmi, S. Ramraj, and R. Ramesh Kannan, “Transfer learning approach for identification of malicious domain names,” Communications in Computer and Information Science, vol. 969, pp. 656–666, 2019, doi: 10.1007/978-981-13-5826-5_51/COVER.
G. Wejinya and S. Bhatia, “Machine Learning for Malicious URL Detection,” Advances in Intelligent Systems and Computing, vol. 1270, pp. 463–472, 2021, doi: 10.1007/978-981-15-8289-9_45/COVER.
F. Alzubaidi, “DETECT MALWARE URL USING NAIVE BAYES ALGORITHM.”
A. E. El-Din, E. El-Din Hemdan, and A. El-Sayed, “Malweb: An efficient malicious websites detection system using machine learning algorithms,” ICEEM 2021 - 2nd IEEE International Conference on Electronic Engineering, Jul. 2021, doi: 10.1109/ICEEM52022.2021.9480648.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
W. Fadheel, W. Al-Mawee, and S. Carr, “On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study),” 2022 5th International Conference on Data Science and Information Technology, DSIT 2022 - Proceedings, 2022, doi: 10.1109/DSIT55514.2022.9943832.
C. Liu and G. Wang, “Analysis and detection of spam accounts in social networks,” 2016 2nd IEEE International Conference on Computer and Communications, ICCC 2016 - Proceedings, pp. 2526–2530, May 2017, doi: 10.1109/COMPCOMM.2016.7925154.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020, 53.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.
Kumi, S.; Lim, C.H.; Lee, S.G. Malicious URL detection based on associative classification. Entropy 2021, 23, 182.
Raja, A.S.; Vinodini, R.; Kavitha, A. Lexical features based malicious URL detection using machine learning techniques. Mater. Today: Proc. 2021, 47 Pt 1, 163–166.
Joshi, A.; Lloyd, L.; Westin, P.; Seethapathy, S. Using lexical features for malicious URL detection—A machine learning approach. arXiv 2019, arXiv:1910.06277.
Kang, C.; Huazheng, F.; Yong, X. Malicious URL identification based on deep learning. Comput. Syst. Appl. 2018, 27, 27–33.
Yuan, J.T.; Liu, Y.P.; Yu, L. A novel approach for malicious URL detection based on the joint model. Secur. Commun. Netw. 2021, 2021, 4917016.
Le, H.; Pham, Q.; Sahoo, D.; Hoi, S.C. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv 2018, arXiv:1802.03162.
Yuan, J.; Chen, G.; Tian, S.; Pei, X. Malicious URL detection based on a parallel neural joint model. IEEE Access 2021, 9, 9464–9472.
Zhao, G.; Wang, P.; Wang, X.; Jin, W.; Wu, X. Two-dimensional code malicious URL detection method based on decision tree. Inf. Secur. Technol. 2014, 5, 36–39.
Liu, C.; Wang, L.; Lang, B.; Zhou, Y. Finding effective classifier for malicious URL detection. In Proceedings of the 2018 2nd International Conference on Management Engineering, Software Engineering and Service Sciences, Wuhan, China, 13–15 January 2018; pp. 240–244.
Lin, H.L.; Li, Y.; Wang, W.P.; Yue, Y.L.; Lin, Z. Efficient malicious URL detection method based on segment pattern. Commun. J. 2015, 36, 141–148.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.

Sakarya University Journal of Computer and Information Sciences-Cover

Yayın Aralığı: Yılda 3 Sayı
Başlangıç: 2018
Yayıncı: Sakarya Üniversitesi

Arşiv

Sayıdaki Diğer Makaleler

Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm

Murat KOCA, İsa AVCI, Mohammed Abdulkareem Shakir AL-HAYANİ

Predicting Effective Efficiency of the Engine for Environmental Sustainability: A Neural Network Approach

Beytullah EREN, İdris CESUR

Rapid and Precise Identification of COVID-19 through Segmentation and Classification of CT and X-ray Images

Ahmet SAYGILI

Optimization of Several Deep CNN Models for Waste Classification

Mahir KAYA, Samet ULUTÜRK, Yasemin ÇETİN KAYA, Onur ALTINTAŞ, Bülent TURAN

Deep Learning-Based Classification of Dermoscopic Images for Skin Lesions

Ahmet Furkan SÖNMEZ, Serap ÇAKAR, Feyza CEREZCİ, Muhammed KOTAN, İbrahim DELİBAŞOĞLU, Gülüzar ÇİT