AHMET TOPRAK, NİĞMET KÖKLÜ, AYŞEGÜL TOPRAK, RECAİ ÖZCAN

Comparison of Classification Techniques on Energy Efficiency Dataset

The definition of the data mining can be told as to extract information or knowledge from large volumes of data. Statistical and machine learning techniques are used for the determination of the models to be used for data mining predictions. Today, data mining is used in many different areas such as science and engineering, health, commerce, shopping, banking and finance, education and internet. This study make use of WEKA (Waikato Environment for Knowledge Analysis) to compare the different classification techniques on energy efficiency datasets. In this study 10 different Data Mining methods namely Bagging, Decorate, Rotation Forest, J48, NNge, K-Star, Naïve Bayes, Dagging, Bayes Net and JRip classification methods were applied on energy efficiency dataset that were taken from UCI Machine Learning Repository. When comparing the performances of algorithms it’s been found that Rotation Forest has highest accuracy whereas Dagging had the worst accuracy

___

[1] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,” Knowledge-Based Syst., vol. 8, no. 6, pp. 373–389, 1995.

[2] P. Gray and H. J. Watson, “Present and Future Directions in Data Warehousing,” Data Base Adv. Inf. Syst., vol. 29, no. 3, pp. 83– 90, 1998.

[3] I. Saritas, M. Koklu, and K. Tutuncu, “Performance of Classification Techniques on Parkinson’s Disease,” Int. J. Adv. Sci. Eng. Technol., vol. 5, no. 2, pp. 9–13, 2017.

[4] M. Langaas, “Discrimination and classification. Technical report. Department of Mathematical Sciences, The Norwegian Institute of Technology. Norway,” 1995.

[5] M. Koklu and K. Tutuncu, “Classification of Chronic Kidney Disease With Most Known data Mining Methods,” Int. J. Adv. Sci. Eng. Technol., vol. 5, no. 2, pp. 14–18, 2017.

[6] J. Han, M. Kamber, and J. Pei, “Data Mining: Concepts and Techniques. Elsevier,” 2011.

[7] M. Castelli, L. Trujillo, L. Vanneschi, and A. Popovic, “Prediction of energy performance of residential buildings: A genetic programming approach,” Energy Build., vol. 102, pp. 67–74, 2015.

[8] S. S. Gilan and B. Dilkina, “Sustainable Building Design: A Challenge at the Intersection of Machine Learning and Design Optimization,” pp. 101–106, 2015.

[9] S. A. Kalogirou, “Artificial neural networks in energy,” Int. J. Low Carbon Technol., vol. 1, no. 3, pp. 201–216, 2006.

[10] A. Tsanas and A. Xifara, “Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools,” Energy Build., vol. 49, no. August 2011, pp. 560–567, Jun. 2012.

[11] “Blake A.C.L. and Merz C.J. (1998). University of California at Irvine Repository of Machine Learning Databases, https://archive.ics.uci.edu/ml/datasets/Energy+efficiency, Last Access: 20.01.2017.”

[12] R. Arora and Suman, “Comparative Analysis of Classification Algorithms on Different Datasets using WEKA,” Int. J. Comput. Appl., vol. 54, no. 13, pp. 21–25, 2012.

[13] K. Tutuncu and M. Koklu, “Comparison of Classification Techniques on Dermatological Dataset,” Int. J. Biomed. Sci. Bioinforma., vol. 3, no. 1, pp. 10–13, 2016.

[14] J. Abellán and J. G. Castellano, “A comparative study on base classifiers in ensemble methods for credit scoring,” Expert Syst. Appl., vol. 73, pp. 1–10, 2017.

[15] E. Bauer, R. Kohavi, P. Chan, S. Stolfo, and D. Wolpert, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Mach. Learn., vol. 36, no. August, pp. 105–139, 1999.

[16] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.

[17] Z. Barutcuoglu, “A Comparison of Model Aggregation Methods for Regression.,” Icann, pp. 76–83, 2003.

[18] M. Koklu and K. Tutuncu, “Performance of Classification Techniques on Medical Datasets,” Int. J. Biomed. Sci. Bioinforma., vol. 3, no. 1, pp. 5–9, 2016.

[19] P. Melville and R. J. Mooney, “Diverse ensembles for active learning,” in Twenty-first international conference on Machine learning - ICML ’04, 2004, p. 74.

[20] S. Satu, T. Akter, and J. Uddin, “Performance Analysis of Classifying Localization Sites of Protein using Data Mining Techniques and,” pp. 860–865, 2017.

[21] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software,” ACM SIGKDD Explor. Newsl., vol. 11, no. 1, p. 10, Nov. 2009.

[22] G. Kaur and A. Chhabra, “Improved J48 Classification Algorithm for the Prediction of Diabetes,” Int. J. Comput. Appl., vol. 98, no. 22, pp. 13–17, 2014.

[23] M. Panda, A. Abraham, and M. R. Patra, “Hybrid intelligent systems for detecting network intrusions,” Secur. Commun. Networks, vol. 8, no. 16, pp. 2741–2749, Nov. 2015.

[24] D. Y. Mahmood and M. A. Hussein, “Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction,” Int. Organ. Sci. Res. J. Comput. Eng. Vol, vol. 15, no. December, pp. 107–112, 2013.

[25] F. Alam and S. Pachauri, “Comparative Study of J48 , Naive Bayes and One-R Classification Technique for Credit Card Fraud Detection using WEKA,” Adv. Comput. Sci. Technol., vol. 10, no. 6, pp. 1731–1743, 2017.

[26] K. Wang, “Outcome Prediction of DOTA2 Based on Naive Bayes Classifier,” no. 1994, pp. 4–6, 2017.

[27] B. Trivedi and N. Kapadia, “Modified Stacked Generalization with Sequential learning,” Int. J. Comput. Appl., pp. 38–43, 2012.

[28] W. W. Cohen, “Fast effective rule induction,” Proc. Twelfth Int. Conf. Mach. Learn., vol. 95, pp. 115–123, 1995