Development of a Python-Based Classification Web Interface for Independent Datasets

Development of a Python-Based Classification Web Interface for Independent Datasets

Classification; biomedical, bioinformatics, medicine, engineering etc. It is a fundamental approach that is frequently used in many research areas, such as especially in the field of health; it has become common to classify diseases with machine learning methods using risk factors of these diseases and to determine the effect levels of these risk factors on the related disease. There are both commercial and free software tools that researchers can analyze their data with classification methods. The aim of this study is to develop a user-friendly web-based software for classification analysis. Python sklearn and Dash libraries were used during the development of the software. Among the classification algorithms in the developed software; Logistic regression, Decision trees, Support vector Machines, Random Forest, LightGBM, Gaussian Naive Bayes, AdaBoost and XGBoost methods are available. In order to show how the software works, a classification model was created with the Random forest algorithm using the cervical cancer data set. Different metric values were evaluated for the models. Obtained from a random forest classification model;accuracy, sensitivity, specificity, negative predictive value, matthews correlation coefficient, and F1 score values obtained from the model were 94.44%, 100%, 93.33%, 100%, 83.67%, and 94.44 respectively. It is thought that the classification software developed in this study will provide great convenience to clinicians and researchers in the field of medicine, in terms of applying predictive classification algorithms for the disease without any software knowledge.

___

  • S. Özekes, "Veri madenciliği modelleri ve uygulama alanları," 2003.
  • S. Y. B. DALI, "VERİ MADENCİLİĞİ VE MÜŞTERİ İLİŞKİLERİ YÖNETİMİNDE (CRM) BİR UYGULAMA."
  • N. Zhong and L. Zhou, Methodologies for Knowledge Discovery and Data Mining: Third Pacific-Asia Conference, PAKDD'99, Beijing, China, April 26-28, 1999, Proceedings: Springer, 2003.
  • H. Akpınar, "Veri Tabanlarında Bilgi Keşfi ve Veri Madenciliği, İstanbul Üniversitesi, İşletme Fakültesi Dergisi, C," ed: XXIX, 2000.
  • G. AKGÜL, A. A. ÇELİK, Z. E. AYDIN, and Z. K. ÖZTÜRK, "Hipotiroidi Hastalığı Teşhisinde Sınıflandırma Algoritmalarının Kullanımı," Bilişim Teknolojileri Dergisi, vol. 13, pp. 255-268, 2020.
  • B. Gülmez, "Yapay Sinir Ağlarinin Yenİ Metasezgİsel Algorİtmalar İle Eğİtİmİ Ve Verİ Madencİlİğİnde Siniflandirma Alaninda Kullanimı," Ercİyes ünİversİtesİ, Fen bİlİmlerİ enstİtüsü endüstrİ mühendİslİğİ anabİlİm dali.
  • Y. E. Kuyucu, "Lojistik regresyon analizi (LRA), yapay sinir ağları (YSA) ve sınıflandırma ve regresyon ağaçları (C&RT) yöntemlerinin karşılaştırılması ve tıp alanında bir uygulama," Gaziosmanpaşa Üniversitesi, Sağlık Bilimleri Enstitüsü, 2012.
  • R. Machmud and A. Wijaya, "Behavior determinant based cervical cancer early detection with machine learning algorithm," Advanced Science Letters, vol. 22, pp. 3120-3123, 2016.
  • F. Köktürk, H. Ankarali, and V. Sümbüloglu, "Veri Madenciligi Yöntemlerine Genel Bakis/Overview to Data Mining Methods," Türkiye Klinikleri Biyoistatistik, vol. 1, p. 20, 2009.
  • N. Bayram, "Multinominal lojistik regresyon analizinin istihdamdaki işgücüne uygulanması," İstanbul Üniversitesi İktisat Fakültesi Mecmuası, vol. 54, p. 61, 2004.
  • H. Bircan, "Lojistik regresyon analizi: Tıp verileri üzerine bir uygulama," Kocaeli Üniversitesi Sosyal Bilimler Dergisi, pp. 185-208, 2004.
  • G. Ulusoy, "Karar ağacı analizi ile AB genişleme kriterlerinin değerlendirilmesi," 2013.
  • G. Silahtaroğlu, "Veri madenciliği," Papatya Yayınları, İstanbul, 2008.
  • E. AKÇETİN and U. ÇELİK, "İstenmeyen elektronik posta (spam) tespitinde karar ağacı algoritmalarının performans kıyaslaması," Internet Uygulamaları ve Yönetimi Dergisi, vol. 5, pp. 43-56, 2014.
  • V. Vapnik, The nature of statistical learning theory: Springer science & business media, 2013.
  • Ö. Y. AKŞEHİRLİ, H. ANKARALI, D. AYDIN, and Ö. SARAÇLI, "Tıbbi Tahminde Alternatif Bir Yaklaşım: Destek Vektör Makineleri," Turkiye Klinikleri Journal of Biostatistics, vol. 5, 2013.
  • M. Pal, "Random forest classifier for remote sensing classification," International journal of remote sensing, vol. 26, pp. 217-222, 2005.
  • J. C. Griffis, J. B. Allendorfer, and J. P. Szaflarski, "Voxel-based Gaussian naïve Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans," Journal of neuroscience methods, vol. 257, pp. 97-108, 2016.
  • S. S. Dhaliwal, A.-A. Nahid, and R. Abbas, "Effective intrusion detection system using XGBoost," Information, vol. 9, p. 149, 2018.
  • D. Wang, Y. Zhang, and Y. Zhao, "LightGBM: an effective miRNA classification method in breast cancer patients," in Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics, 2017, pp. 7-11.
  • T.-K. An and M.-H. Kim, "A new diverse AdaBoost classifier," in 2010 International conference on artificial intelligence and computational intelligence, 2010, pp. 359-363.
  • İ. PERÇİN, F. H. YAĞIN, A. K. ARSLAN, and C. ÇOLAK, "An Interactive Web Tool for Classification Problems Based on Machine Learning Algorithms Using Java Programming Language: Data Classification Software," in 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2019, pp. 1-7.
  • T. C. Sharma and M. Jain, "WEKA approach for comparative study of classification algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, pp. 1925-1931, 2013.
  • L. StataCorp, "Stata data analysis and statistical Software," Special Edition Release, vol. 10, p. 733, 2007.
  • R. RStudio Team, "RStudio: integrated development for R," RStudio, Inc., Boston, MA URL http://www. rstudio. com, vol. 42, p. 14, 2015.