Kalp hastalık risk tahmini için Python aracılığıyla sınıflandırıcı algoritmalarının performans değerlendirmesi

Kalp hastalıkları, günümüzün en büyük sağlık problemlerinden birisidir. Hastalık için erken teşhis, erken ölümlerin önüne geçilebilir. Bu amaçla Kaggle veri tabanından elde dilen veri setinde bulunan 13 bağımsız değişken kullanılarak kalp hastalığı olma olasılığı az (KHOA) ve fazla (KHOF) olan kişiler ayırt edilmeye çalışılmıştır. Çalışmada destek vektör makinaları (DVM), k-en yakın komşu (k-NN), karar ağaçları (KA), lineer diskriminant analiz (LDA), Gausian Naive Bayes (GNB), Gradient Boosting (GB) ve Random Forest (RF) olmak üzere 7 sınıflandırma algoritması kullanılmıştır. Random forest, özgüllük (%100), Matthews korelasyon katsayısı (0.90), Fowlkes-Mallows indeks (0.82), F1 skoru (%89.7) ve doğruluk (%90.2) değerlerine göre çalışmanın en iyi tahminini yapan algoritması olmuştur. Açlık kan şekeri, KHOA ve KHOF grupları arasında istatiksel olarak anlamlı fark saptanamamış ve özellikler arasında en az önemli olduğu bulunmuştur. Bu özellik çıkarılarak yapılan sınıflandırma işlemlerinde önemli bir performans değişikliği görülmemiştir. Sadece işlem zamanları, az da olsa kısalmıştır. Bu çalışma, erken teşhislere destek olacağından dolayı kalp hastalığının tahmininde fayda sağlayacaktır.

Performance evaluation of classifier algorithms with Python for heart disease risk prediction

Heart diseases are one of the biggest health problems of today. Early diagnosis for the disease can prevent early deaths. For this purpose, by using 13 independent variables in the data set obtained from the Kaggle database, people with low probability of heart disease and people with excess were tried to be distinguished. Seven classification algorithms were used in the study, namely support vector machines (SVM), k-NN, decision trees, linear discriminant analysis (LDA), Gausian Naive Bayes (GNB), Gradient Boosting (GB) and Random Forest (RF). Random forest was the algorithm that made the best estimation of the study according to the values of specificity (100%), Matthews correlation coefficient (0.90), Fowlkes-Mallows index (0.82), F1 score (89.7%) and accuracy (90.2%). There was no statistically significant difference between the groups in fasting blood glucose and it was found to be the least important among the features. No significant performance change was observed in the classification processes made by removing this feature. Only the processing times are slightly shorter. This study will help predict heart disease as it will support early diagnosis.

___

  • [1] World Health Organization, Cardiovascular Diseases, https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1 (Erişim Tarihi: 12.07.2020).
  • [2] Anbarasi, M., Anupriya, E., Iyengar, N.C.H.S.N. 2010. Enhanced prediction of heart Disease with feature subset selection using genetic algorithm, International Journal of Engineering Science and Technology, Cilt. 2, s. 5370-5376.
  • [3] Palaniappan, S., Awang, R. 2008. Intelligent heart disease prediction system using data mining techniques, International Journal of Computer Science and Network Security, Cilt. 8, s. 343-350.
  • [4] Nahar, J., Imam, T., Tickle, K.S., Chen,Y.-P.P. 2013. Computational intelligence for heart disease diagnosis: A medical knowledge driven approach, Expert Systems with Applications, Cilt. 40, s. 96-104. DOI: 10.1016/j.eswa.2012.07.032.
  • [5] Soman, K.P., Shyam, D.M., Madhavdas, P. 2003. Efficient classification and analysis of ischemic heart disease using proximal support vector machines based decision trees. Conference on convergent technologies for AsiaPacific region, 15-17 Oct., Bangalore, India, 214-217. DOI: 10.1109/TENCON.2003.1273317
  • [6] Kim, B.-H., Lee, S.-H., Cho, D.-U., Oh, S.-Y. 2008. A proposal of heart diseases diagnosis method using analysis of face color. International conference on advanced language processing and web information technology, 23-25 July, Dalian Liaoning, China, 220-225. DOI: 10.1109/ALPIT.2008.27
  • [7] Gamboa, A.L.G., Mendoza, M.G., Orozco, R.E.I., VARGAS, J.M., Gress, N.H. 2006. Hybrid Fuzzy-SV clustering for heart disease identification, Computational intelligence for modelling International conference on control and automation, and international conference on intelligent agents, web technologies and internet commerce, 28 Nov.-1 Dec., Sydney, NSW, Australia. DOI: 10.1109/CIMCA.2006.114
  • [8] Zheng, J., Jiang, Y., Yan, H. 2006. Committee machines with ensembles of multilayer perceptron for the support of diagnosis of heart diseases. Proceedings of the international conference on, communications, circuits and systems, 25-28 June, Guilin, China, 2046-2050. DOI: 10.1109/ICCCAS.2006.285080
  • [9] Obayya, M., Abou-chadi, F. 2008. Data fusion for heart diseases classification using multi-layer feed forward neural network. International conference on computer engineering&systems, ICCES, 25-27 Nov., Cairo, Egypt, 67-70. DOI: 10.1109/ICCES.2008.4772968
  • [10] Avci, E. 2009. A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier, Expert Systems with Applications, Cilt. 36, s. 10618–10626. DOI: 10.1016/j.eswa.2009.02.053
  • [11] Maglogiannis, I., Loukis, E., Zafiropoulos, E., Stasis, S. 2009. Support vectors machine-based identification of heart valve diseases using heart sounds, Computer Methods and Programs in Biomedicine, Cilt. 95, s. 47–61. DOI: 10.1016/j.cmpb.2009.01.003
  • [12] Chen, A.H., Huang, S.Y., Hong, P.S., Cheng, C.H., Lin, E.J. 2011. HDPS: Heart Disease Prediction System, Computing in Cardiology, Cilt. 38, s. 557-560.
  • [13] Patel, J., Tejalupadhyay, S., Patel, S.B. 2016. Heart Disease Prediction Using Machine learning and Data Mining Technique, International Journal of Computer Science & Communication, Cilt. 7, s. 129-137. DOI: 10.090592/IJCSC.2016.018
  • [14] Jarad, A., Katkar, R., Shaikh, A.R., Salve, A. 2015. Intelligent Heart Disease Prediction System with MongoDB, International Journal of Emerging Trends & Technology in Computer Science, Cilt. 4, s. 236-239.
  • [15] Enriko, I.K.A., Suryanegara, M., Gunawan, D. 2016. Heart Disease Prediction System using k-Nearest Neighbor Algorithm with Simplified Patient’s Health Parameters, Journal of Telecommunication, Electronic and Computer Engineering (JTEC), Cilt. 8, s. 59-65.
  • [16] Shao, Y.E., Hou, C.-D., Chiu, C.-C. 2014. Hybrid intelligent modeling schemes for heart disease classification, Applied Soft Computing, Cilt. 14, s. 47-52. DOI: 10.1016/j.asoc.2013.09.020
  • [17] Taşçı, M.E., Şamlı, R. 2020. Veri Madenciliği İle Kalp Hastalığı Teşhisi, Avrupa Bilim ve Teknoloji Dergisi, Özel sayi:88-95. DOI: 10.31590/ejosat.araconf12
  • [18] Tunç, Z., Cicek, I.B., Guldogan E. 2020. Performance evaluation of the deep learning models in the classification of heart attack and determination of related factors, The Journal of Cognitive Systems, Cilt. 5 s. 99-103. https://dergipark.org.tr/tr/pub/jcs/issue/59409/801555
  • [19] Akalın, B., Veranyurt, Ü., Veranyurt, O. 2020. Classification of individuals at risk of heart disease using machine learning, Cumhuriyet Medical Journal, Cilt 42, s. 283-289. DOI: 10.7197/cmj.vi.742161
  • [20] Khan, I.H., Mondal, R.H.M. 2020. Data-Driven Diagnosis of Heart Disease, International Journal of Computer Applications, Cilt 176, s. 46-54. DOI: 10.5120/ijca2020920549
  • [21] Tougui, I., Jilbab, A., El Mhamdi, J. 2020.Heart disease classification using data mining tools and machine learning techniques, Health Technol., Cilt.10, s.1137–1144. DOI: 10.1007/s12553-020-00438-1
  • [22] Datasets. “Health care: Data set on heart attack possibility”. https://www.kaggle.com/nareshbhat/health-care-data-set-on-heart-attack possibility?select=heart.csv (Erişim Tarihi: 10.07.2020).
  • [23] Aburomma, A.A., Reaz M.B.I. 2017. A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems, Information Sciences, Cilt. 414, s. 225-246. DOI: 10.1016/j.ins.2017.06.007
  • [24] Kausar, N., Samir, B.B., Sulaiman, S.B., Ahmad, I., Hussain, M. 2012. An approach towards intrusion detection using pca feature subsets and svm. 2012 International Conference on Computer & Information Science (ICCIS), 12-14 June, Kuala Lumpeu, Malaysia, 569-574. DOI: 10.1109/ICCISci.2012.6297095
  • [25] Taharwat, A., Mahdi, H., Elhoseny, M., Hassanien, A.E. 2018. Recognizing human activity in mobile crowd sensing environment using optimized k-NN algorithm, Expert Systems With Applications, Cilt. 107, s. 32-44. DOI: 10.1016/j.eswa.2018.04.017
  • [26] Venkatasubramaniam, A., Wolfson, J., Mitchel, N., Barnes, T., JaKa, M., French, S. 2017. Decision trees in epidemiological research, Emerg Themes Epidemiol., Cilt. 14, s. 1-12. DOI: 10.1186/s12982-017-0064-4
  • [27] Sifaou, H., Kammoun, A., Alouini, M.-S. 2020. High-dimensional Linear Discriminant Analysis Classifier for Spiked Covariance Model, Journal of Machine Learning Research, Cilt. 21, s. 1-24.
  • [28] Zhang, Z., Wang, S., Bian, W. 2020. Sign consistency for the linear programming discriminant rule, Pattern recognition, Cilt. 100, s. 1-11. DOI: 10.1016/j.patcog.2019.107083
  • [29] Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., et al. 2008. Top 10 algorithms in data mining, Knowledge and information systems, Cilt. 14, s. 1–37. DOI: 10.1007/s10115-007-0114-2
  • [30] Ng, A.Y., Jordan, M.I., 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, Advances in neural information processing systems, Cilt. 2,s. 841–848.
  • [31] Jahromi, A.H., Taheri, M. 2017. A non-parametric mixture of Gaussian naive Bayes classifiers based on local independent features. 2017 Artificial Intelligence and Signal Processing Conference (AISP), 25-27 Oct. Shiraz, Iran, 209-212. DOI: 10.1109/AISP.2017.8324083
  • [32] Lee, K., Kwan, M.-P.. 2018. Physical activity classification in free-living conditions using smartphone accelerometer data and exploration of predicted results, Computers, Environment and Urban Systems, Cilt. 67, s. 124-131. DOI: 10.1016/j.compenvurbsys.2017.09.012
  • [33] Friedman, J.H. 2001. Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Cilt. 29, s. 1189–1232. DOI: 10.1214/aos/1013203451
  • [34] Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., et al. 2007. Random forests for classification in ecology, Ecology, Cilt. 88, s. 2783–2792. DOI: 10.1890/07-0539.1
  • [35] Fu, Y. 2017. Combination of Random Forests and Neural Networks in Social Lending, Journal of Financial Risk Management, Cilt. 6, s. 418-426. DOI: 10.4236/jfrm.2017.64030
Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi-Cover
  • ISSN: 1302-9304
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 1999
  • Yayıncı: Dokuz Eylül Üniversitesi Mühendislik Fakültesi
Sayıdaki Diğer Makaleler

Software Risk Assessment and Management with Rules Based on Fuzzy Approach

Mustafa BATAR, KÖKTEN ULAŞ BİRANT, ALİ HAKAN IŞIK

PLC ile Ağırlık ve Sıcaklık Kontrollü Fırın Tasarımı ve Gerçekleştirilmesi

Adem GÖLCÜK, Abdulazim HANSU

Transfer matrisi kullanılarak Pasternak zemini üzerine oturan kademeli Timoshenko kirişlerinin serbest titreşim analizi

Baran Bozyigit

Elektrokimyasal Su Dezenfeksiyonunda Elektrot Tipinin Bakteri Arıtımı ile Klor ve Radikal Üretimine Etkisi

Emine Esra GEREK, Ayşe Tansu KOPARAL, Ali Savaş KOPARAL

Bir Dinamik Buhar Sıkıştırma Çevrimi Modelinin Geliştirilmesi ve Doğrulanması

Mert TURGUT

Topoloji ve Malzemenin Eklemeli İmalat Yöntemiyle Üretilen Yapıların Mekanik Özelliklerine Etkileri

Gizem Acar YAVUZ, Binnur GÖREN KIRAL, Samet KATRE, Dilek ATİLLA

Bulanık Yaklaşımlı Kurallar ile Yazılım Risk Değerlendirmesi ve Yönetimi

Mustafa BATAR, Kökten BİRANT, Ali Hakan ISIK

Lif Kabağı Takviye Edilmiş Kitosan-İpek Hidrojel Kompozit Doku İskelelerinin Kıkırdak Doku Hasarı Tedavisinde Kullanımının Araştırılması

Oylum ÇOLPANKAN GÜNEŞ, İbrahim ÖZER, AYLİN ZİYLAN, Aylin KARA, Hasan HAVİTÇİOĞLU

Raylı Sistem İstasyon Yeri Belirleme Problemi İçin Küme Kapsama ve Alternatif Servis Seviyeli P-Medyan Modelleriyle Çözüm Arayışı: Gebze-Darıca Metro Hattı Uygulaması

Selin SONER KARA, Gökhan YURDAKUL

Karma Deprem Yalıtım Sistemlerinin Deneysel Performans Değerlendirmesi

Cem YENİDOĞAN