A postpruning decision algorithm based on loss minimization
A postpruning decision algorithm based on loss minimization
In this paper, a post-pruning method known as zero-one loss function pruning (ZOLFP) that is based onzero-one loss function is introduced. The proposed ZOLFP method minimizes the expected loss, rather than evaluatingthe misclassification error rate of a node and its subtree. The subtree is pruned when expected loss of the node is lessthan or equal to the sum of the loss of its leaves. The experimental results demonstrate that ZOLFP method outperformsUn-pruned C4.5 Decision Tree (UDT-C4.5) algorithm, reduced error pruning (REP), and minimum error pruning (MEP)in terms of performance accuracy in all used datasets. It is also shown that the complexity of the proposed methodZOLFP is not more than the complexity of REP and MEP methods. Furthermore, the results show that ZOLFP methodachieves satisfactory results compared to REP, MEP, and UDT-C4.5 algorithms in terms of precision score, recall score,true positive rate, false positive rate, F-measure, and area under ROC scores during the experiment process.
___
- [1] Anyanwu MN, Shiva SG. Comparative analysis of serial decision tree classification algorithms. International Journal
of Computer Science and Security 2009; 3 (3): 230-240.
- [2] Han DH, Zhang X, Wang GR. Classifying uncertain and evolving data streams with distributed extreme learning
machine. Journal of Computer Science and Technology 2015; 30 (4): 874-887. doi: 10.1007/s11390-015-1566-6
- [3] Sumathi S, Sivanandam S. Data mining tasks, techniques, and applications. In: Introduction to Data Mining and
its Applications. New York, NY/Berlin: Springer, 2006. pp. 195-216.
- [4] Esposito F, Malerba D, Semeraro G, Kay J. A comparative analysis of methods for pruning decision trees. IEEE
Transactions on Pattern Analysis and Machine Intelligence 1997; 19 (5): 476-491.
- [5] Han J, Pei J, Kamber M. Data mining: concepts and techniques. 3rd ed. Waltham, MA, USA: Morgan Kaufmann,
Elsevier, 2011.
- [6] Osei-Bryson KM. Post-pruning in decision tree induction using multiple performance measures. Computers &
Operations Research 2007; 34 (11): 3331-3345. doi: 10.1016/j.cor.2005.12.009
- [7] Wang T, Qin Z, Jin Z, Zhang S. Handling over-fitting in test cost-sensitive decision tree learning by feature selection,
smoothing and pruning. Journal of Systems and Software 2010; 83 (7): 1137-1147. doi:
10.1016/j.jss.2010.01.002
- [8] Esmeir S, Markovitch S. Anytime learning of anycost classifiers. Machine Learning 2011; 82 (3): 445-473. doi:
10.1007/s10994-010-5228-1
- [9] Sheng VS, Ling CX. Cost-sensitive learning. In: Encyclopedia of Data Warehousing and Mining. 2nd ed. New York,
NY, USA: IGI Global, 2009. pp. 339-345.
- [10] Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE. Pruning decision trees with misclassification costs. In: 10th
European Conference on Machine Learning (ECML 98); Chemnitz, Germany; 1998. pp. 131-136.
- [11] Katz G, Shabtai A, Rokach L, Ofek N. ConfDtree: a statistical method for improving decision trees. Journal of
Computer Science and Technology 2014; 29 (3): 392-407. doi: 10.1007/s11390-014-1438-5
- [12] Kapoor P, Rani R. Efficient decision tree algorithm using J48 and reduced error pruning. International Journal of
Engineering Research and General Science 2015; 3 (3): 1613-1621.
- [13] Patel RR, Aluvalu R. A reduced error pruning technique for improving accuracy of decision tree learning. International Journal of Engineering and Advanced Technology 2014; 3 (5): 8-11.
- [14] Alali A, Kubat M. Prudent: A pruned and confident stacking approach for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 2015; 27 (9): 2480-2493. doi: 10.1109/TKDE.2015.2416731
- [15] Xie H, Shang F. The study of methods for post-pruning decision trees based on comprehensive evaluation standard.
In: IEEE 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 14); Xiamen,
China; 2014. pp. 903-908.
- [16] Xu ZL, Min F, Zhu W. Cost-sensitive decision trees with post-pruning and competition for numeric data. Journal
of Computational Information Systems 2013; 9 (20): 7991-8001.
- [17] Zhang W, Li Y. A post-pruning decision tree algorithm based on Bayesian. In: International Conference on
Computational and Information Sciences(ICCIS 13); Shiyan, Hubai, China; 2013. pp. 988-991.
- [18] Mohamed WNHW, Salleh MNM, Omar AH. A comparative study of Reduced Error Pruning method in decision
tree algorithms. In: IEEE 2012 International Conference on Control System, Computing and Engineering (ICCSCE
12); Penang, Malaysia; 2012. pp. 392-397.
- [19] Mahmood AM, Mrithyumjaya PGVGK, Kuppa R. A new pruning approach for better and compact decision trees.
International Journal on Computer Science & Engineering 2010; 2 (8): 2551-2558.
- [20] Kijsirikul B, Chongkasemwongse K. Decision tree pruning using backpropagation neural networks. In: IEEE 2001
International Joint Conference on Neural Networks (IJCNN 01); Washington, USA; 2001. pp. 1876-1880.
- [21] Galathiya A, Ganatra A, Bhensdadia C. Improved decision tree induction algorithm with feature selection, cross
validation, model complexity and reduced error pruning. International Journal of Computer Science and Information
Technologies 2012; 3 (2): 3427-3431.
- [22] Patil DD, Wadhai V, Gokhale J. Evaluation of decision tree pruning algorithms for complexity and classification
accuracy. International Journal of Computer Applications 2010; 11 (2): 23-30.
- [23] Hall MP. Remote detection and predicted locations of NIPF fuel treatments in central Oregon. MS, Oregon State
University, Oregon, USA, 2015.
- [24] Wang H, Chen B. Intrusion detection system based on multi-strategy pruning algorithm of the decision tree. In:
IEEE 2013 International Conference on Grey Systems and Intelligent Services (GSIS 13); Macao; 2013. pp. 445-447.
- [25] Chen J, Wang X, Zhai J. Pruning decision tree using genetic algorithms. In: IEEE 2009 International Conference
on Artificial Intelligence and Computational Intelligence (AICI 09); Beijing, China; 2009. pp. 244-248.
- [26] Esposito F, Malerba D, Semeraro G, Tamma V. The effects of pruning methods on the predictive accuracy of
induced decision trees. Applied Stochastic Models in Business and Industry 1999; 15 (4): 277-299.
- [27] Zhou A, Qian W, Qian H, Jin W. A new classification method to overcome over-branching. Journal of Computer
Science and Technology 2002; 17 (1): 18-27.
- [28] Rahmati O, Kornejady A, Samadi M, Deo RC, Conoscenti C et al. PMT: New analytical framework for automated evaluation of geoenvironmental modelling approaches. Science of the Total Environment 2019; 664: doi:
10.1016/j.scitotenv.2019.02.017
- [29] Goerigk S, Hilbert S, Jobst A, Falkai P, Buhner M et al. Predicting instructed simulation and dissimulation when
screening for depressive symptoms. European Archives of Psychiatry and Clinical Neuroscience 2018; 1-16. doi:
10.1007/s00406-018-0967-2
- [30] Bahnsen AC, Stojanovic A, Aouada D, Ottersten B. Cost sensitive credit card fraud detection using Bayes minimum
risk. In: IEEE 2013 12th International Conference on Machine Learning and Applications (ICMLA 13); Florida,
USA; 2013. pp. 333-338.
- [31] Fawcett T, Provost F. Adaptive fraud detection. Data Mining and Knowledge Discovery 1997; 1 (3): 291-316.
- [32] Thomas P, El Haouzi HB, Suhner MC, Thomas A, Zimmermann E et al. Using a classifier ensemble for proactive
quality monitoring and control: the impact of the choice of classifiers types, selection criterion, and fusion process.
Computers in Industry 2018; 99: 193-204. doi: 10.1016/j.compind.2018.03.038
- [33] Wu G, Tian Y, Liu D. Cost-sensitive multi-label learning with positive and negative label pairwise correlations.
Neural Networks 2018; 108: 411-423. doi: 10.1016/j.neunet.2018.09.003
- [34] Kubat M, Holte R, Matwin S. Learning when negative examples abound. In: 9th European Conference on Machine
Learning (ECML 97); Prague, Czech Republic; 1997. pp. 146-153.
- [35] Loh WY. Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2011; 1 (1): 14-23.
- [36] Pazzani M, Merz C, Murphy P, Ali K, Hume T et al. Reducing misclassification costs. In: 11th International
Conference on Machine Learning (ICML 94); New Brunswick, NJ, USA; 1994. pp. 217-225.
- [37] Kotsiantis SB. Decision trees: A recent overview. Artificial Intelligence Review 2013; 39 (4): 261-283. doi:
10.1007/s10462-011-9272-4
- [38] Chen W, Zhang S, Li R, Shahabi H. Performance evaluation of the gis-based data mining techniques of bestfirst decision tree, random forest, and Naive Bayes tree for landslide susceptibility modeling. Science of the Total
Environment 2018; 644: 1006-1018. doi: 10.1016/j.scitotenv.2018.06.389