Soheila Saeedi, Sorayya Rezayi, Keivan Maghooli

Applying Data Mining Approaches for Chronic Kidney Disease Diagnosis

Kidney disease is one of the most common problems today that many people in the world deal with it. Therefore, in this study, our main objective is to use several computational-based algorithms to classify and diagnose Chronic Kidney Disease. The applied data in our study were publicly available data on chronic kidney disease. Eight classifiers were used to classify chronic kidney disease into two groups (patient or not). We used the Windows 10 operating system and RapidMiner Studio 9.8 version. The confusion matrix provides us the TP, FP, FN, and TN values; some performance measures were calculated to evaluate the used techniques. Evaluation of data mining algorithms revealed that Random Forest (with 100 trees), Deep Learning network (with five hidden layers), and Neural Network (with 0.02 training rate and 100 cycles) reached the highest accuracy rates with 99.09%, 98.04%, and 96.52% respectively. However, it is notable that Random Forest, Support Vector Machine, and Deep Learning network achieved 1 for AUC. Data mining on health-related issues can be considered one of the most useful data analysis tools. These classification methods are beneficial for specialists in the medical diagnosis process, and by using these techniques, hidden patterns are extracted from the raw data.

PDF

___

Levey AS, Eckardt K-U, Tsukamoto Y, Levin A, Coresh J, Rossert J, et al. Definition and classification of chronic kidney disease: a position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney international. 2005;67(6):2089-100.
Xie Y, Bowe B, Mokdad AH, Xian H, Yan Y, Li T, et al. Analysis of the Global Burden of Disease study highlights the global, regional, and national trends of chronic kidney disease epidemiology from 1990 to 2016. Kidney international. 2018;94(3):567-81.
Goh ZS, Griva K. Anxiety and depression in patients with end-stage renal disease: impact and management challenges–a narrative review. International journal of nephrology and renovascular disease. 2018;11:93.
Rady E-HA, Anwar AS. Prediction of kidney disease stages using data mining algorithms. Informatics in Medicine Unlocked. 2019;15:100178.
Jothi N, Husain W. Data mining in healthcare–a review. Procedia computer science. 2015;72:306-13.
Yuliastuti GE, Alfiyatin AN, Rizki AM, Hamdianah A, Taufiq H, Mahmudy W. Performance Analysis of Data Mining Methods for Sexually Transmitted Disease Classification. International Journal of Electrical & Computer Engineering (2088-8708). 2018;8(5).
Itani S, Lecron F, Fortemps P. Specifics of medical data mining for diagnosis aid: A survey. Expert systems with applications. 2019;118:300-14.
Oskouei RJ, Kor NM, Maleki SA. Data mining and medical world: breast cancers’ diagnosis, treatment, prognosis and challenges. American journal of cancer research. 2017;7(3):610.
Shouman M, Turner T, Stocker R, editors. Using data mining techniques in heart disease diagnosis and treatment. 2012 Japan- Egypt Conference on Electronics, Communications and Computers; 2012: IEEE.
Safdari R, Rezayi S, Saeedi S, Tanhapour M, Gholamzadeh M. Using data mining techniques to fight and control epidemics: A scoping review. Health and Technology. 2021:1-13.
Vijayarani S, Dhayanand S. Data mining classification algorithms for kidney disease prediction. Int J Cybernetics Inform. 2015;4(4):13-25.
Chaurasia V, Pal S, Tiwari B. Chronic kidney disease: a predictive model using decision tree. International Journal of engineering Research and technology. 2018.
Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:201016061. 2020.
Polat H, Mehr HD, Cetin A. Diagnosis of chronic kidney disease based on support vector machine by feature selection methods. Journal of medical systems. 2017;41(4):55.
Zebari R, Abdulazeez A, Zeebaree D, Zebari D, Saeed J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. Journal of Applied Science and Technology Trends. 2020;1(2):56-70.
Karamizadeh S, Abdullah SM, Halimi M, Shayan J, javad Rajabi M, editors. Advantage and drawback of support vector machine functionality. 2014 International conference on computer, communications, and control technology (I4CT); 2014: IEEE.
Ao Y, Li H, Zhu L, Ali S, Yang Z. The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. Journal of Petroleum Science and Engineering. 2019;174:776-89.
Azar AT, Elshazly HI, Hassanien AE, Elkorany AM. A random forest classifier for lymph diseases. Computer methods and programs in biomedicine. 2014;113(2):465-73.
Raghavendra S, Santosh KJ. Performance evaluation of random forest with feature selection methods in prediction of diabetes. International Journal of Electrical and Computer Engineering. 2020;10(1):353.
Amirgaliyev Y, Shamiluulu S, Serek A, editors. Analysis of chronic kidney disease dataset by applying machine learning methods. 2018 IEEE 12th International Conference on Application of nformation and Communication Technologies (AICT); 2018: IEEE.
Kwon O, Sim JM. Effects of data set features on the performances of classification algorithms. Expert Systems with Applications. 2013;40(5):1847-57.