A Two Stage Hybrid Ensemble Classifier Based Diagnostic Tool for Chronic Kidney Disease Diagnosis Using Optimally Selected Reduced Feature Set

This paper presents an idea of applying a two stage hybrid ensemble classifier for improving the prediction accuracy of Machine Learning based automated diagnosis of chronic kidney disease on the basis of values of an optimally selected subset of clinical and physiological parameters fed to it. Chronic kidney disease is a generalized term for various heterogeneous disorders affecting the structure and function of the kidney. It is a disease with high mortality rate. In this paper the authors have proposed a two stage hybrid ensemble technique with very high efficiency. In two stage hybrid ensemble classifier the potential of individual classification algorithms are combined together. In addition to this the authors optimally selected 8 parameters of prime importance from the set of 24 parameters of the dataset used for the study .The parameters (features) selected represent the intersection of the two sets; one containing medically essential parameters arranged in decreasing contribution to the diagnosis and other set containing parameters ranked in decreasing order of their contribution in the Machine Learning classification process. The results depict that the two stage hybrid ensemble is a very efficient method for classification of chronic kidney disease. The results of this ensemble classifier on the optimally selected reduced feature set (with 8 parameters) as well as the complete feature set (with 24 parameters) in terms of various performance metrics are predictive accuracy of (2-class) 100%, sensitivity of 1, precision of 1, specificity of 1 and F-value of 1. The GUI based diagnostic tool developed on the basis of the proposed ensemble can act as a tool for assisting doctors for cross-validating their findings of initial screening of chronic kidney disease using fewer clinical parameters thus helping them to attend to the needs of more patients in less time.

___

[1] David Poole, Alan Mackworth, Randy Goebel, Computational Intelligence: A Logical Approach (New York: Oxford University Press, 1998) p.1.

[2] A.N Ramesh, C Kambhampati, J.R Monson, PJ Drew, Artificial intelligence in medicine. (Ann R Coll Surg Engl, 2004), pp. 334– 338.

[3] Ethem Alpaydın, Introduction to Machine Learning (The MIT Press, Cambridge, Massachusetts ,London, England, 2010) p. 3.

[4] Abid Sarwar, Vinod Sharma, Comparative analysis of machine learning techniques in prognosis of type II diabetes in AI & Society (Springer Verlag, 2013).



[6] T.Manju, K.Priya and R.chitra, Heart Disease Prediction System using Weight Optimized Neural Network, in International Journal of Engineering and Computer Science, Vol-2, No.3 pp.781-788, 2013.

[7] B. Sokouti, S. Haghipour, and A. D. Tabrizi, A framework for diagnosing cervical cancer disease based on feedforward MLP neural network and ThinPrep histopathological cell image featuresin Neural Comput. Appl., Vol. 24, No. 1 (2014) , pp. 221–232,.

[8] Andrew S Levey, Josef Coresh, Chronic kidney disease in Lancet (2012), pp. 165–80.

[9] A S Levey, R Atkins, J Coresh, E. P Cohen, A.J Collins, K.U Eckardt, M. E Nahas, B. L Jaber, M Jadoul, A Levin, N. R Powe, J. Rossert, D. C Wheeler, N. Lameire, and G. Eknoyan, Chronic kidney disease as a global public health problem: approaches and initiatives - a position statement from Kidney Disease Improving Global Outcomes in Kidney Int., Vol. 72 (2007) pp. 247–259.

[10] L.A Stevens, A.S Levey, Current status and future perspectives for CKD testing in American Journal of Kidney Diseases(2009), pp. 17- 26.

[11] WorldKidneyDay:ChronicKidneyDisease(2015);http://www.world kidneyday.org/faqs/chronic-kidney-disease/

[12] Igor Kononenko, Machine Learning for Medical Diagnosis: History, State of the Art and perspective in Artificial Intelligence in Medicine, Vol.23, No. 1 (Elsevier, , 2001).

[13] Hardik Maniya, Mosin I. Hasan, Komal P. Patel, Comparative study of Naïve Bayes Classifier and KNN for Tuberculosis in International Conference on Web Services Computing (ICWSC,2011).

[14] R.Bharat Rao, Jinbo Bi, Nancy Obuchowski and David Naidich, LungCAD: A Clinically approved Machine Learning System for Lung Cancer Detection in International conference on knowledge discovery and data mining (San Jose, California, USA, 2007) ACM 978-1-59593-609-7/07/0008.

[15] P Yasodha, M Kannan, Analysis of Population of Diabetic Patient Database in WEKA Tool in International Journal of Science and Engineering Research, Vol.2, No.5 (May 2011).

[16] Bekir Karlik, Hepatitis Disease Diagnosis using Backpropagation and the Naive Bayes Classifiers in Journal of Science and Technology, Vol.1, No.1, (2011).

[17] Huda Yasin, Tahseen A.Jilani, Madiha Danish, Hepatitis-C Classification using Data Mining Techniques in International Journal of Computer Applications, Vol.24, No.3 (June 2011).

[18] L.C van der Gaag, S Renooij, C.L.M. Witteman, B.M.P Aleman and B.G Taal, Probabilities for a probabilistic network: a case study in oesophageal cancer in Artificial Intelligence in Medicine, Vol. 25, No 2 (Elsevier, , 2002), pp. 123–148.

[19] B. Sokouti, S. Haghipour, and A. D. Tabrizi, A framework for diagnosing cervical cancer disease based on feedforward MLP neural network and ThinPrep histopathological cell image features in Neural Comput. Appl.,Vol. 24, No. 1 (2014), pp. 221– 232.

[20] Shivajirao M. Jadhav , Sanjay L. Nalbalwar and Ashok A. Ghatol, Artificial Neural Network Models based Cardiac Arrhythmia Disease Diagnosis from ECG Signal Data in International Journal of Computer Applications, Vol.44 ,No 15 (2012), pp. 8-13.

[21] T Manju, K Priya and R chitra, Heart Disease Prediction System using Weight Optimized Neural Network in International Journal of Engineering and Computer Science, Vol-2, No.3 (2013), pp.781- 788,.

[22] S.Vijayarani and S.Dhayanand, Kidney disease prediction using Support Vector Machine and Artificial Neural Network algorithms in International Journal of Computing and Business Research (IJCBR), Vol. 1, No. 3 (2015), pp.1765-1771.

[23] Andrew Kusiak, Bradley Dixonb and Shital Shaha, Predicting survival time for kidney dialysis patients: a data mining approach in Computers in Biology and Medicine, Vol. 35 (Elsevier, 2005), p. 311–327.



[25] D Lavanya, K Usha Rani, Ensemble Decision Tree Classifier For Breast Cancer Data in International Journal of Information Technology Convergence and Services, Vol.2, No.1(2012), pp. 17- 24.

[26] N.H Barakat, A.P Bradley and M.N.H Barkat, Intelligible support vector machines for diagnosis of diabetes mellitus in IEEE Transactions on Information Technology in BioMedicine(2009).

[27] B.M Patil, R.C Joshi, Durga Toshniwal, Association rule for classification of type-2 diabetic patients in IEEE-Second International Conference on Machine Learning and Computing (2010), p.67. DOI 10.1109/ICMLC.

[28] S Bhatia, P Prakash, G.N Pillai, SVM based Decision Support System for Heart Disease Classification with Integer-coded Genetic Algorithm to select critical features in Proceedings of the World Congress on Engineering and Computer Science, San Francisco, (USA,2008), pp. 34–38.

[29] My Chau Tu, Dongil Shin, Dongkyoo Shin, Effective Diagnosis of Heart Disease through Bagging Approach in 2nd International Conference on Biomedical Engineering and Informatics(2009).

[30] B. Boukenze, A. Haqiq, & H. Mousannif, Predicting Chronic Kidney Failure Disease Using Data Mining Techniques in Advances in Ubiquitous Networking 2, Springer Singapore(2017) , pp. 701- 712.

[31] Lambodar Jena and Narendra Ku. Kamila, Distributed Data Mining Classification Algorithms for Prediction of Chronic Kidney Disease in International Journal of Emerging Research in Management and Technology, Vol. 4, No. 11(2015), pp. 110-118.

[32] Rubini,L.Jerlin,UCIMachineLearningRepository[http://archive.ics. uci.edu/ml/datasets/Chronic_Kidney_Disease].Karaikudi,TamilNa du: Algappa University, Department of Computer Science and Engineering(2015).

[33] Sahil Sharma, Vinod Sharma, Atul Sharma, Performance Based Evaluation of Various Machine Learning Classification Techniques for Chronic Kidney Disease Diagnosis in International Journal of Modern Computer Science (IJMCS), Vol.4, Issue 3(2016), pp. 11- 16.

[34] J. R Quinlen, Introduction of Decision Trees in Machine Learning, vol. 1(1986), pp. 81-106

[35] Yoav Freund, Robert E Schapire, A decision-theoretic generalization of on-line learning and an application to boosting‖ in Proceedings of the Second European Conference on Computational Learning Theory, EuroCOLT '95, (London, UK, Springer-Verlag, 1995), pp. 23-37

[36] Marina Sokolova, Guy Lapalme, A systematic analysis of performance measures for classification tasks in Information Processing and Management,Vol.45, Issue-4(Elsevier, 2009), pp.427-437.

[37] A Janecek, W N W Gansterer, M Demel, and G Ecker, On the Relationship Between Feature Selection and Classification Accuracy in Fsdm, Vol. 4(2008), pp. 90–105.

[38] V Jha, G Garcia-Garcia, K Iseki, et al., Chronic kidney disease: global dimension and perspectives in Lancet, Vol. 382 (July 2013), pp. 260-272.