Vinothsaravanan RAMAKRISHNAN, Chenniappan PALANISAMY

BIBSQLQC: Brown infomax boosted SQL query clustering algorithm to detect anti-patterns in the query log

Discovery of antipatterns from arbitrary SQL query log depends on the static code analysis used to enhance the quality and performance of software applications. The existence of antipatterns reduces the quality and leads to redundant SQL statements. SQL log includes a large load on the database and it is difficult for an analyst to extract large patterns in a minimal time. Existing techniques which discover antipatterns in SQL query face a lot of innumerable challenges to discover the normal sequences of queries within the log. In order to discover the antipatterns in the log, an efficient technique called Brown infomax boosted SQL query clustering (BIBSQLQC) technique is introduced. Initially, the number of patterns (i.e. queries) are extracted from the SQL query log. After extracting the patterns, the ensemble clustering process is carried out to find out the antipatterns from the given query log. The Brown infomax boost clustering is an ensemble learning method for grouping the patterns by constructing several weak learners. The Brown clustering is used as a weak learner for partitioning the patterns into ‘k’ number of clusters based on the Euclidean distance measure. Then the weak learner merges the two clusters with maximum information gained to minimize the time complexity. The clustering results of weak learners are combined into strong results with minimal error rate (ER). By this way, the antipattern in the SQL query log is detected with a higher accuracy. Experimental evaluation is conducted with different parameters namely detection accuracy (DA), false positive rate (FPR) and time complexity (TC) using the two SQL query log data-sets (DS). The experimental result shows that, the BIBSQLQC technique achieves higher DA with lower TC and FPR than the conventional methods

PDF

___

[1] Wen JR, Zhang HJ. Query clustering in the web context. In: Xiong H (editor). Clustering and Information Retrieval. Springer, USA: Springer Press, 2004, pp. 195-225
[2] Sabir F, Rasool G, Yousaf M. A Lightweight approach for specification and detection of SOAP Anti- Patterns. International Journal of Advanced Computer Science and Applications 2017; 8 (5): 455-467. doi: 10.14569/IJACSA.2017.080555
[3] Badia A, Wagner A. Complex SQL Predicates as Quantifiers. IEEE Transactions on Knowledge and Data Engi- neering 2014; 26 (7): 1617-1630. doi: 10.1109/TKDE.2013.55
[4] Kim MY, Lee DH. Data-mining based SQL injection attack detection using internal query trees. Expert Systems with Applications 2014; 41 (11): 5416-5430. doi: 10.1016/j.eswa.2014.02.041
[5] Kar D, Panigrahi S, Sundararajan S. SQLiGoT: Detecting SQL injection attacks using graph of tokens and SVM. Computers & Security 2016; (60): 206-225. doi: 10.1016/j.cose.2016.04.005
[6] Chung YC, Wu MC, Chen YC, Chang WK. A Hot Query Bank approach to improve detection performance against SQL injection attacks. Computers & Security 2012; 31 (2): 233-248. doi: 10.1016/j.cose.2011.11.007
[7] McWhirter PR, Kifayat K, Shi Q, Askwith B. SQL Injection Attack classification through the feature extraction of SQL query strings using a Gap-Weighted String Subsequence Kernel. Journal of Information Security and Applications 2018; (40): 199-216. doi: 10.1016/j.jisa.2018.04.001
[8] Wiese L. Clustering-based fragmentation and data replication for flexible query answering in distributed databases. Journal of Cloud Computing 2014; 3 (18): 1-15. doi: 10.1186/s13677-014-0018-0
[9] Zimniak M, Getta JR, Benn W. Predicting database workloads through mining periodic patterns in database audit trails. Vietnam Journal of Computer Science 2015; 2 (4): 201-211. doi: 10.1007/s40595-015-0042-0
[10] JFan J, Zhang M, Kok S, Lu M, Ooi BC. Crowdop: Query optimization for declarative crowdsourcing systems. IEEE Transactions on Knowledge and Data Engineering 2015; 27 (8): 2078-2092. doi: 10.1109/TKDE.2015.2407353
[11] Jardiansah JT, Wibawa AP, Widyaningtyas T, Yasuhisa O. SQL logic error detection using start end mid algorithm. Knowledge Engineering and Data Science (KEDS) 2018; 1 (1): 33-38. doi: 10.17977/um018v1i12018p33-38
[12] Dalai AK, Jena SK. Neutralizing SQL injection attack using server side code modification in web applications. security and communication Networks 2017; (2017): 1-12. doi: 10.1155/2017/3825373
[13] Ordonez C. Optimization of linear recursive queries in SQL. IEEE Transactions on Knowledge and Data Engineering 2010; 22 (2): 264-277. doi: 10.1109/TKDE.2009.83
[14] Chen Z, Li T, Sun Y. A Learning approach to SQL Query results ranking using skyline and users’ current navigational behavior. IEEE Transactions on Knowledge and Data Engineering 2013; 25 (12): 2683-2693. doi: 10.1109/TKDE.2012.128
[15] Xu W, He Z, Lo E, Chow CY. Explaining Missing Answers to Top-k SQL Queries. IEEE Transactions on Knowledge and Data Engineering 2016; 28 (8): 2071-2085. doi: 10.1109/TKDE.2016.2547398
[16] Pinzón CI, De Paz JF, Herrero Á, Corchado E, Bajo J et al. idMAS-SQL: intrusion detection based on MAS to detect and block SQL injection through data mining. Information Sciences 2013; (231): 15-31. doi: 10.1016/j.ins.2011.06.020
[17] Guo Y, Li N, Offutt J, Motro A. Exoneration-based fault localization for SQL predicates. Journal of Systems and Software 2019; (147): 230-245. doi: 10.1016/j.jss.2018.10.037
[18] Chandra B, Chawda B, Kar B, Reddy KVM, Shah S et al. Data generation for testing and grading SQL queries. The VLDB Journal 2015; 24 (6): 731-755. doi: 10.1007/s00778-015-0395-0
[19] Kul G, Luong DTA, Xie T, Chandola V, Kennedy O et al. Similarity metrics for SQL query clustering. IEEE Transactions On Knowledge And Data Engineering 2018; 30 (12): 2408-2420. doi: 10.1109/TKDE.2018.2831214
[20] Arzamasova N, Schaler M, Bohm K. Cleaning antipatterns in an SQL query log. IEEE Transactions On Knowledge and Data Engineering 2018; 30 (3): 421-434. doi: 10.1109/TKDE.2017.2772252
[21] Lino A, Rocha Á, Macedo L, Sizo A. Application of clustering-based decision tree approach in SQL query error database. Future Generation Computer Systems 2019; (93): 392-406. doi: 10.1016/j.future.2018.10.038
[22] Tang P, Qiu W, Huang Z, Lian H, Liu G. Detection of SQL injection based on artificial neural network. Knowledge- Based Systems 2020; 190: 105528. doi: 10.1016/j.knosys.2020.105528 [23] Baldacci L, Golfarelli M. A cost model for SPARK SQL. IEEE Transition on Knowledge Data Engineering 2019; 31(5): 819-832. doi: 10.1109/TKDE.2018.2850339
[24] Mrozek D, Kwiendacz J, Malysiak-Mrozek B. Protein construction-based data partitioning scheme for alignment of protein macromolecular structures through distributed querying in federated databases. IEEE Transaction on Nanobioscience. 2020;19 (1): 102-116. doi: 10.1109/TNB.2019.2930494