SOTARM: Size of transaction-based association rule mining algorithm

SOTARM: Size of transaction-based association rule mining algorithm

Mining of association rules tries to identify the existence of promising and fruitful relations among the items present in a database. The basic a priori algorithm suffers from multiple database scans, and if the database is large, then the time taken for scanning and generation of candidates is also large. The proposed algorithm attempts to reduce the repeated scanning of the whole database. Using this algorithm, scanning time and also the generation of subitems that are not frequent can be reduced. The former can be done by sorting the transaction records in descending order based on the size of transaction (SOT) and scanning only those transactions whose SOT is greater than or equal to k (size of item sets). The latter can be done by analyzing the item set state. It is not required to generate the next set of candidate item sets using those item sets that are not frequent. Both positive and negative mining has been done in R Studio of the R data mining tool using the R language. Experimental results show that the SOT algorithm performs better than the Apriori, Eclat, PVARM (partition-based validation for association rule mining), and NRRM (nonredundant rule method) algorithms. The work has been tested against various standard datasets such as Adult, Genome, Groceries, and SER (State Electricity Rate) Prediction. The speed-up and efficiency parameter values obtained from the algorithm strongly suggest that the proposed SOTARM algorithm has attained better performance when compared to all the other existing algorithms.

___

  • [1] Hegland M. The Apriori algorithm –a tutorial. In: Goh SS, Ron A, Shen Z, editors. Mathematics and Computation in Imaging Science and Information Processing. Singapore: World Scientific Publishing, 2007. pp. 209-262.
  • [2] Onkamo P, Toivonen H. A survey of data mining methods for linkage disequilibrium mapping. Hum Genom 2006; 2: 336-340.
  • [3] Leung KS, Wong KC, Chan TM, Wong MH, Lee KH, Lau CK, Tsui SKW. Discovering protein-DNA binding sequence patterns using association rule mining. Nucleic Acids Res 2010; 38: 6324-6337.
  • [4] Creighton C, Hanash S. Mining gene expression databases for association rules. Bioinformatics 2003; 19: 79-86.
  • [5] Agrawal R, Imielinski T, Swami A. Mining associations between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data; May 1993; Washington, DC, USA. New York, NY, USA: ACM. pp. 207-216.
  • [6] Piatetsky-Shapiro G. Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ, editors. Knowledge Discovery in Databases. Cambridge, MA, USA: AAAI Press, 1991. pp. 229-248.
  • [7] Brin S, Motwani R, Ullman J, Tsur S. Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD Conference on Management of Data; May 1997; Tucson, AZ, USA. New York, NY, USA: ACM. pp. 255-264.
  • [8] Rameshkumar K, Sambath M, Ravi S. Relevant association rule mining from medical dataset using new irrelevant rule elimination technique. In: International Conference on Information Communication and Embedded Systems; February 2013. pp. 300-304.
  • [9] Yuanyuan W, Min W. Mining non-redundant rules for redescription datasets based on FCA. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery; October 2010. pp. 13-20.
  • [10] Venkatesan N, Ramaraj. Frequent itemset mining with bit search. Journal of Theoretical and Applied Information Technology 2011; 4: 111-121.
  • [11] Zaki MJ. Parallel and distributed association mining: a survey. IEEE Concurr 1999, 7: 14-25.
  • [12] Jha IN, Borah S. Efficient association rule mining using improved Apriori algorithm. International Journal of Scientific and Engineering Research 2012; 3: 1-4.
  • [13] Wu H, Lu Z, Pan L, Xu R, Jiang W. An improved apriori-based algorithm for association rules mining. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery; 2009. pp. 51-55.
  • [14] Wan Q, An A. Discovering transitional patterns and their significant milestones in transaction databases. IEEE T Knowl Data En 2009; 21: 1692-1707.
  • [15] Liang Z, Xinming T, Lin L, Wenliang J. Temporal association rule mining based on T-Apriori algorithm and its typical application. In: International Symposium on Spatio-Temporal Modeling, Spatial Reasoning Analysis, Data Mining and Data Fusion; 2005.
  • [16] Han EH, Karypis G, Kumar V. Scalable parallel data mining for association rules. IEEE T Knowl Data En 2000; 12: 337-352.
  • [17] Liu Y, Liu Q, Zheng D, Deng Q. Application and improvement discussion about apriori algorithm of association rule mining in cases mining of influenza treated by contemporary famous old Chinese medicine. In: International Conference on Bioinformatics and Biomedicine Workshops; October 2012. pp. 316-322.
  • [18] Chen Z, Cai S, Sang Q, Zhu C. An improved apriori algorithm based on pruning optimization and transaction reduction. In: Second International Conference on Artificial Intelligence, Management Science and Electronic Commerce; August 2011. pp. 1908-1911.
  • [19] Wu B, Zhang D, Lan Q, Zheng J. An efficient frequent patterns mining algorithm based on apriori algorithm and the FP-tree structure. In: Third International Conference on Convergence and Hybrid Information Technology; November 2008. pp. 1099-1102.
  • [20] Wang K, He Y, Han J. Pushing support constraints into association rules mining. IEEE T Knowl Data En 2003; 15: 642-658.
  • [21] Cheung Y, Wai-Chee A. Mining frequent itemsets without support threshold: with and without item constraints. IEEE T Knowl Data En 2004; 16: 1052-1069.
  • [22] Song M, Rajasekaran S. A transaction mapping algorithm for frequent itemsets mining. IEEE T Knowl Data En 2006; 18: 472-481.
  • [23] Gudes E, Shimong SE, Vanetik N. Discovering frequent graph patterns using disjoint paths. IEEE T Knowl Data En 2006; 18: 1441-1456.
  • [24] Ding Q, Ding Q, Perrizo W. PARM - An efficient algorithm to mine association rules from spatial data. IEEE T Syst Man Cy B 2008; 38: 1513-1524.
  • [25] Fengwu L, Xiao J. An improved CDAR algorithm based on reducing the scanning transaction data. Journal of Theoretical and Applied Information Technology 2013; 47: 666-670.
  • [26] Motamedi A, Zareipour H, Rosehart WD. Electricity price and demand forecasting in smart grids. IEEE T Smart Grid 2012; 3: 664-674.
  • [27] Chen Q, Hu X, Chen B. Effective identification of negative regulation patterns of protein kinases. IEEE T Nanobiosci 2013; 12: 98-105.
  • [28] Chen X, Dong X, Ma Y, Geng R. Application of non-redundant association rules in university library. Adv Comp Intell 2009; 116: 423-431.
  • [29] Shah RA, Asghar S. Privacy preserving in association rules using a genetic algorithm. Turk J Electr Eng Co 2014; 22: 434-450.
  • [30] Akg¨obek O. A rule induction algorithm for knowledge discovery and classification. Turk J Electr Eng Co 2013; 21: ¨ 1223-1241.
  • [31] Ali K, Ayt¨urk K. Extracting fuzzy rules for the diagnosis of breast cancer. Turk J Electr Eng Co 2013; 21: 1495-1503.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Implementation of a personal area network for secure routing in MANETs by using low-cost hardware

Debika BHATTACHARYYA, Himadri Nath SAHA, Rohit SINGH, Pranab Kumar BANERJEE

An intelligent approach in delay tolerant network routing

Karim MOHAMMADI, Azadeh OMIDVAR

A game-theoretic framework for active distribution network planning to benefit different participants under the electricity market

Jinyue SHI, Junqiang WEN, Jianhua ZHANG, Bo ZENG

A new efficient block matching data hiding method based on scanning order selection in medical images

Turgay AYDOĞAN, Cüneyt BAYILMIŞ

A data-aware cognitive engine for scheduling data intensive applications in a grid

Vijaya NAGARAJAN, Maluk ABDUL MULK MOHAMED

Design fabrication and test of an X-band dual polarized aperture-coupled reflectarray element for beam switching

Gholamreza MORADI, Abdolali ABDIPOUR, Iman ARYANIAN

Transmission power control using state estimation-based received signal strength prediction for energy efficiency in wireless sensor networks

Ramakrishnan SABITHA, Krishna BHUMA, Thangavelu THYAGARAJAN

An online approach for feature selection for classification in big data

Nasrin Banu NAZAR, Radha SENTHILKUMAR

The impact of sampling frequency and amplitude modulation index on low order harmonics in a 3-phase SV-PVM voltage source inverter

Myzafere LIMANI, Milaim ZABELI, Nebi CAKA, Qamil KABASHI

A high performance hysteresis current control of a permanent magnet synchronous motor drive

Marcel AGU, Cosmas OGBUKA, Cajethan NWOSU