Tao JIANG, George GATUHA

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining algorithms. Experimental results have demonstrated the efficiency of the two data structures in frequent itemsets mining. This work proposes FDM, a new algorithm based on FP-tree and DIFFset data structures for efficiently discovering frequent patterns in data. FDM can adapt its characteristics to efficiently mine long and short patterns from both dense and sparse datasets. Several optimization techniques are also outlined to increase the efficiency of FDM. An evaluation of FDM against three frequent itemset data mining algorithms, dEclat, FP-growth, and FDM* (FDM without optimization), was performed using datasets having both long and short frequent patterns. The experimental results show significant improvement in performance compared to the FP-growth, dEclat, and FDM* algorithms.

PDF

___

[1] Agrawal R, Imielinski T, Swami AN. Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data; May 1993; Washington, DC, USA. New York, NY, USA: ACM. pp. 207-216.
[2] Agrawal R, Srikant R. Fast algorithms for mining association rules. In: 20th International Conference on Very Large Data Bases; 1994; Santiago, Chile. San Francisco, CA, USA: Morgan Kaufmann, pp. 487-499.
[3] Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases; 1995; Zurich, Switzerland. San Francisco, CA, USA: Morgan Kaufmann, pp. 432-444.
[4] Gunopulos D, Khardoni R, Mannila H, Saluja S, Toivonen H, Sewark R. Discovering all the most speci?c sentences. ACM T Database Syst 2003;28: 140-174.
[5] Xiang C, Sen S, Shengzhi X, Zhengyi L. DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting. Comput Secur 2015; 50: 74-90.
[6] Akshita B, Ashutosh G, Debasis D. Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Procedia Computer Science 2015; 46: 644-651.
[7] Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS et al. Top 10 algorithms in data mining. Knowl Inf Syst 2007; 14: 1-37.
[8] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: SIGMOD Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data; 2000; Dallas, TX, USA. pp. 1-12.
[9] Pei J, Han J, Lu H, Nishio S, Tang S, Yang D. Hmine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of the International Conference on Data Mining; November 2001; San Jose, CA, USA. pp. 441-448.
[10] Zaki M, Parthasarathy S, Ogihara M, Li W. New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining; 1997; Menlo Park, CA, USA. pp. 283-286.
[11] Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of International Conference on Data Engineering; April 2001; Heidelberg, Germany. pp. 443-452.
[12] Zaki MJ, Gouda K. Fast vertical mining using DIFFsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003; Washington, DC, USA. New York, NY, USA: ACM Press. pp. 326-335.
[13] Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004; 8: 53-87.
[14] Zaki MJ, Parthasarathy S, Ogihara M, Li W. New Algorithms for Fast Discovery of Association Rules. Technical Report URCS TR 651. Rochester, NY, USA: University of Rochester, 1997.
[15] Lan V, Gita A. Novel parallel method for mining frequent patterns on multi-core shared memory systems. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems; 2013; New York, NY, USA. pp. 49-54.
[16] Lan V, Gita A. Mining frequent patterns based on data characteristics. In: International Conference on Information and Knowledge Engineering; 1619 July 2012; Las Vegas, NV, USA. pp. 369-375.
[17] Lan V, Gita A. Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput 2014; 40: 768-785.
[18] Borgelt C. An implementation of the FP-growth algorithm. In: Workshop on Open Source Data Mining Software; 2005; New York, NY, USA. pp. 1-5.
[19] Shporer S. AIM2: Improved implementation of AIM. In: Proceedings of Workshop on FIMI; November 2004; Brighton, UK. pp. 23-32.
[20] Fournier-Viger PG, Gueniche T, Soltani A, Wu C, Tseng VS. SPMF: A Java open-source pattern mining library. J Mach Learn Res 2014; 15: 3389-3393.
[21] Lin JCW, Gan W, Fournier-Viger P, Hong TP. RWFIM: Recent weighted-frequent itemsets mining. Eng Appl Artif Intel 2015; 45: 18-32.