Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining algorithms. Experimental results have demonstrated the efficiency of the two data structures in frequent itemsets mining. This work proposes FDM, a new algorithm based on FP-tree and DIFFset data structures for efficiently discovering frequent patterns in data. FDM can adapt its characteristics to efficiently mine long and short patterns from both dense and sparse datasets. Several optimization techniques are also outlined to increase the efficiency of FDM. An evaluation of FDM against three frequent itemset data mining algorithms, dEclat, FP-growth, and FDM* (FDM without optimization), was performed using datasets having both long and short frequent patterns. The experimental results show significant improvement in performance compared to the FP-growth, dEclat, and FDM* algorithms.

___

  • [1] Agrawal R, Imielinski T, Swami AN. Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data; May 1993; Washington, DC, USA. New York, NY, USA: ACM. pp. 207-216.
  • [2] Agrawal R, Srikant R. Fast algorithms for mining association rules. In: 20th International Conference on Very Large Data Bases; 1994; Santiago, Chile. San Francisco, CA, USA: Morgan Kaufmann, pp. 487-499.
  • [3] Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases; 1995; Zurich, Switzerland. San Francisco, CA, USA: Morgan Kaufmann, pp. 432-444.
  • [4] Gunopulos D, Khardoni R, Mannila H, Saluja S, Toivonen H, Sewark R. Discovering all the most speci?c sentences. ACM T Database Syst 2003;28: 140-174.
  • [5] Xiang C, Sen S, Shengzhi X, Zhengyi L. DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting. Comput Secur 2015; 50: 74-90.
  • [6] Akshita B, Ashutosh G, Debasis D. Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Procedia Computer Science 2015; 46: 644-651.
  • [7] Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS et al. Top 10 algorithms in data mining. Knowl Inf Syst 2007; 14: 1-37.
  • [8] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: SIGMOD Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data; 2000; Dallas, TX, USA. pp. 1-12.
  • [9] Pei J, Han J, Lu H, Nishio S, Tang S, Yang D. Hmine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of the International Conference on Data Mining; November 2001; San Jose, CA, USA. pp. 441-448.
  • [10] Zaki M, Parthasarathy S, Ogihara M, Li W. New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining; 1997; Menlo Park, CA, USA. pp. 283-286.
  • [11] Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of International Conference on Data Engineering; April 2001; Heidelberg, Germany. pp. 443-452.
  • [12] Zaki MJ, Gouda K. Fast vertical mining using DIFFsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003; Washington, DC, USA. New York, NY, USA: ACM Press. pp. 326-335.
  • [13] Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004; 8: 53-87.
  • [14] Zaki MJ, Parthasarathy S, Ogihara M, Li W. New Algorithms for Fast Discovery of Association Rules. Technical Report URCS TR 651. Rochester, NY, USA: University of Rochester, 1997.
  • [15] Lan V, Gita A. Novel parallel method for mining frequent patterns on multi-core shared memory systems. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems; 2013; New York, NY, USA. pp. 49-54.
  • [16] Lan V, Gita A. Mining frequent patterns based on data characteristics. In: International Conference on Information and Knowledge Engineering; 16–19 July 2012; Las Vegas, NV, USA. pp. 369-375.
  • [17] Lan V, Gita A. Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput 2014; 40: 768-785.
  • [18] Borgelt C. An implementation of the FP-growth algorithm. In: Workshop on Open Source Data Mining Software; 2005; New York, NY, USA. pp. 1-5.
  • [19] Shporer S. AIM2: Improved implementation of AIM. In: Proceedings of Workshop on FIMI; November 2004; Brighton, UK. pp. 23-32.
  • [20] Fournier-Viger PG, Gueniche T, Soltani A, Wu C, Tseng VS. SPMF: A Java open-source pattern mining library. J Mach Learn Res 2014; 15: 3389-3393.
  • [21] Lin JCW, Gan W, Fournier-Viger P, Hong TP. RWFIM: Recent weighted-frequent itemsets mining. Eng Appl Artif Intel 2015; 45: 18-32.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Intrusion detection in network flows based on an optimized clustering criterion

Jaber KARIMPOUR, Shahriar LOTFI, Aliakbar SIAHMARZKOOH TAJARI

Adaptive sliding mode with time delay control based on convolutions for power flow reference tracking using a VSC-HVDC system

Hachemi CHEKIREB, Amar HAMACHE, Mohand Outaher BENSIDHOUM

Reconstruction of a single square pulse originally having 40 ps width coming from a lossy and noisy channel in a point to point interconnect

Alak MAJUMDER, Bidyut BHATTACHARYYA

A neuro-fuzzy controller for grid-connected heavy-duty gas turbine power plants

Mohamed Mustafa IQBAL MOHAMED, Rayappan XAVIER JOSEPH, Jagannathan KANAKARAJ

Implementation of energy management and demand side management of a solar microgrid using a hybrid platform

Milton SAKAYA, Senthilkumaran MAHADEVAN, Leo RAJU

Effect of touch coordinate display as a form of augmented, concurrent visual feedback on the accuracy of single-handed typing via smartphone virtual keyboards

Bora ERGİN, Abdullah Ruhi SOYLU, Görkem YAVAŞ, Sumru KEÇELİ

Design and implementation of a six-port junction based on substrate integrated waveguide

Gholamreza MORADI, Masoud JAFARI, Reza SHIRAZI SARRAF, Rashid MIRZAVAND

Design optimization of a Cuk DC/DC converter based on reliability constraints

Amirreza GHAREHKOUSHAN ZARRIN, Mehdi ABAPOUR, Amir FARAKHOR

Optimum design and operation analysis of permanent magnet-assisted synchronous reluctance motor

Hassan KHAJEROSHANAEE, Mohsen NIASATI, Jamal ASHKEZARI DEHGHANI, Mohammad JAFAR MOJIBIAN

Comparative study of conventional modulation schemes in terms of conducted and radiated EMI generated by three-phase inverters

Mahmoud HAMOUDA, Mohamed SALEM, Jaleleddine SLAMA HADJ BEN