N-Gram Tabanlı Tahmin Modeli ile XSS Saldırısı Algılama

Teknolojideki hızlı gelişmeler web uygulamalarını güçlendirmiştir. Bu arada, web uygulamalarında Siteler Arası Betik Çalıştırma (XSS olarak da bilinir) güvenlik açıklarının varlığı, kullanıcılar için bir endişe haline gelmiştir. Çok sayıda mevcut algılama yaklaşımına rağmen saldırganlar yıllardır XSS güvenlik açıklarından yararlanarak internet kullanıcılarına zarar veriyor. Bu makalede, web uygulamalarındaki XSS Saldırısını tespit etmek için metin madenciliği tabanlı bir yaklaşım tanıtılmaktadır. Bu yaklaşım, halka açık bir kaynak kod dosyalarından bir dizi özellik çıkarmak için oluşturulmuştur. Bu dosyalar, daha sonra bir tahmin modeli oluşturmaya yardımcı olmak için kullanılır. Bulgular, N-Gram ve Kelime Simgeleştirme arasında doğruluk, modeli oluşturmak için harcanan zaman ve AUC-ROC eğrisi birkaç karşılaştırma içerir. Sonuç olarak N-Gram Kelime Simgeleştirmesinden daha iyi performans göstermektedir.

XSS Attack Detection with N-Gram Based Prediction Model

The increment developments in technology has empowered the web applications. Meanwhile, the existence of Cross-Site Scripting (XSS) vulnerabilities in web applications has become a concern for users. In spite of the numerous current detection approaches, attackers have been exploiting XSS vulnerabilities for years, causing harm to the internet users. In this paper, a text-mining based approach to detect XSS attacks in web applications is introduced. This approach is built to extract a set of features from a publicly available source code files, which are then used to build a prediction model. The findings include few comparisons between Word Tokenization and N-Gram in accuracy, time spend to build the model and AUC-ROC curve. The results show that N-Gram tokenization outperforms the Word Tokenization.

___

  • [1] Ying, M., Li, S. Q. 2016. CSP adoption: current status and future prospects. Security and Communication Networks, 9(17), 4557-4573.
  • [2] sucuri.net. 2022. Sucuri Security. sitecheck.sucuri.net.
  • [3] Hearst, M. A. 1999. Untangling text data mining. In Proceedings of the 37th Annual meeting of the Association for Computational Linguistics, pp. 3-10.
  • [4] Feldman, R., Dagan, I. 1995. Knowledge Discovery in Textual Databases (KDT). In KDD, Vol. 95, pp. 112-117.
  • [5] positive technologies. 2022. Threats and Vulnerabilities in Web Applications 2020–2021. www.ptsecurity.com/ww-en/analytics/web-vulnerabilities-2020-2021/
  • [6] OWASP. 2021. OWASP Top Ten. owasp.org/www-project-top-ten/.
  • [7] Doupe, A., Cui, W., Jakubowski, M. H., Peinado, M., Kruegel, C., Vigna, G. 2013. deDacota: toward preventing server-side XSS via automatic code and data separation. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications Security, pp. 1205-1216.
  • [8] Lavrenovs, A., Melón, F. J. R. 2018. HTTP security headers analysis of top one million websites. In 2018 10th International Conference on Cyber Conflict (CyCon) (pp. 345-370). IEEE.
  • [9] Scholte, T., Robertson, W., Balzarotti, D., Kirda, E. 2012. Preventing input validation vulnerabilities in web applications through automated type analysis. In 2012 IEEE 36th annual computer software and applications conference (pp. 233-243). IEEE.
  • [10] Stock, B., Lekies, S., Mueller, T., Spiegel, P., Johns, M. 2014. Precise client-side protection against DOM-based cross-site scripting. In 23rd {USENIX} Security Symposium ({USENIX} Security 14) (pp. 655-670).
  • [11] Salas, M. I. P., Martins, E. 2014. Security testing methodology for vulnerabilities detection of xss in web services and ws-security. Electronic Notes in Theoretical Computer Science, 302, 133-154.
  • [12] Niakanlahiji, A., Jafarian, J. H. 2017. Webmtd: defeating web code injection attacks using web element attribute mutation. In Proceedings of the 2017 Workshop on Moving Target Defense, pp. 17-26.
  • [13] Athanasopoulos, E., Pappas, V., Krithinakis, A., Ligouras, S., Markatos, E. P., Karagiannis, T. 2010. xJS: practical XSS prevention for web application development. In Proceedings of the 2010 USENIX conference on Web application development, p. 13.
  • [14] Van Gundy, M., Chen, H. 2012. Noncespaces: Using randomization to defeat cross-site scripting attacks. Computers & Security, 31(4), 612-628.
  • [15] Gupta, B. B., Soni, H., Siwan, P., Kumar, A., Gupta, S. 2018. DOM-guard: defeating DOM-based injection of XSS worms in HTML5 web applications on Mobile-based cloud platforms. In Computer and Cyber Security, pp. 425-454. Auerbach Publications.
  • [16] Pelizzi, R., Sekar, R. 2012. Protection, usability and improvements in reflected XSS filters. In proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp. 5-5.
  • [17] Mitropoulos, D., Stroggylos, K., Spinellis, D., Keromytis, A. D. 2016. How to train your browser: Preventing XSS attacks using contextual script fingerprints. ACM Transactions on Privacy and Security (TOPS), 19(1), 1-31.
  • [18] Gupta, S., Gupta, B. B., Chaudhary, P. 2018. Hunting for DOM-Based XSS vulnerabilities in mobile cloud-based online social network. Future Generation Computer Systems, 79, 319-336.
  • [19] Weissbacher, M., Robertson, W., Kirda, E., Kruegel, C., Vigna, G. 2015. Zigzag: Automatically hardening web applications against client-side validation vulnerabilities. In 24th {USENIX} Security Symposium ({USENIX} Security 15), pp. 737-752.
  • [20] Maurya, S. 2015. Positive security model based server-side solution for prevention of cross-site scripting attacks. In 2015 Annual IEEE India Conference (INDICON), pp. 1-5. IEEE.
  • [21] Gupta, S., Gupta, B. B. 2016. Enhanced XSS defensive framework for web applications deployed in the virtual machines of cloud computing environment. Procedia Technology, 24, 1595-1602.
  • [22] Guo, X., Jin, S., Zhang, Y. 2015. XSS vulnerability detection using optimized attack vector repertory. In 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 29-36. IEEE.
  • [23] Sundareswaran, S., Squicciarini, A. C. 2012. XSS-Dec: A hybrid solution to mitigate cross-site scripting attacks. In Data and Applications Security and Privacy XXVI: 26th Annual IFIP WG 11.3 Conference, DBSec 2012, Paris, France, July 11-13, 2012. Proceedings 26, pp. 223-238. Springer Berlin Heidelberg
  • [24] Panja, B., Gennarelli, T., Meharia, P. 2015. Handling cross site scripting attacks using cache check to reduce webpage rendering time with elimination of sanitization and filtering in light weight mobile web browser. In 2015 First Conference on Mobile and Secure Services (MOBISECSERV), pp. 1-7. IEEE.
  • [25] Goswami, S., Hoque, N., Bhattacharyya, D. K., Kalita, J. 2017. An Unsupervised Method for Detection of XSS Attack. Int. J. Netw. Secur., 19(5), 761-775.
  • [26] Yusof, I., Pathan, A. S. K. 2016. Mitigating cross-site scripting attacks with a content security policy. Computer, 49(3), 56-63.
  • [27] Gupta, S., Gupta, B.B. 2017. Cross-Site Scripting (XSS) attacks and defense mechanisms: classification and state-of-the-art. International Journal of System Assurance Engineering and Management, 8(1), 512-530.
  • [28] GitHub. 2023. OWASP Community Pages. github.com/OWASP/www-community/blob/master/pages/Types_of_Cross-Site_Scripting.md (Accessed 2 Apr. 2023).
  • [29] Stivalet, Bertrand. 2022. PHP Vulnerability Test Suite. github.com/stivalet/PHP-Vulnerability-test-suite (Accessed 2 Apr. 2023).
  • [30] Chetty, N., Vaisla, K. S., Sudarsan, S. D. 2015. Role of attributes selection in classification of Chronic Kidney Disease patients. In 2015 international conference on computing, communication and security (ICCCS), pp. 1-6. IEEE.
  • [31] Gupta, S., Abraham, S., Sugumaran, V., Amarnath, M. 2016. Fault diagnostics of a gearbox via acoustic signal using wavelet features, J48 Decision Tree and Random Tree Classifier. Indian J Sci Technol, 9, 1-8.
  • [32] Sammut, C. 2011. Encyclopedia of machine learning : with 78 tables. New York, Ny Springer.
  • [33] Wikipedia, 2023, Decision stump. en.wikipedia.org/wiki/Decision_stump (Accessed 2 Apr. 2023).
  • [34] Hvitfeldt, E., Silge J. 2021. Supervised Machine Learning for Text Analysis in R. Smltar.com, smltar.com/. Accessed 19 Jan. 2021.
  • [35] Cavnar, W. B., Trenkle, J. M. 1994. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval (Vol. 161175).
  • [36] Frank, E., et al. 2016. Data Mining: Practical Machine Learning Tools and Techniques. www.cs.waikato.ac.nz/ml/weka.
  • [37] Witten, I H, Frank, E. 2005. Data Mining : Practical Machine Learning Tools and Techniques. Amsterdam ; Boston, Ma, Morgan Kaufman.
  • [38] Tung, Y. H., Tseng, S. S., Shih, J. F., Shan, H. L. 2013. A cost-effective approach to evaluating security vulnerability scanner. In 2013 15th Asia-Pacific Network Operations and Management Symposium (APNOMS), pp. 1-3. IEEE.
  • [39] Taha, A. A., Hanbury, A. 2015. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC medical imaging, 15(1), 1-28.
  • [40] Sasaki, Y. 2007. The Truth of the F-Measure. 26 Oct. 2007, pp. 1–5.
  • [41] Iba, W., Langley, P. 1992. Induction of one-level decision trees. In Machine Learning Proceedings 1992 (pp. 233-240). Morgan Kaufmann.
  • [42] Narkhede, S. 2018. Understanding auc-roc curve. Towards Data Science, 26(1), 220-227.
  • [43] Muschelli III, J. 2020. ROC and AUC with a binary predictor: a potentially misleading metric. Journal of classification, 37(3), 696-708.