TIBBİ VERİ KÜMELERİNDE GENETİK ALGORİTMALARLA ÖZELLİK SEÇİMİ VE SINIFLANDIRMA BAŞARIMINA ETKİSİ

Günümüzde çok büyük boyuttaki tıbbi veri tabanlarından, klinik karar destek sistemlerinin faydalı bilgiler elde etmesi oldukça zorlaşmıştır. Genetik algoritmalar (GA) yaygın olarak kullanılan bir özellik seçme yöntemidir ve en iyi çözümleri verebilir. Bu çalışmada, çok sayıda karmaşık verilere sahip olan tıbbi verilerden özellik seçimi yapmak ve en uygun özellik alt kümesini oluşturarak sınıflandırma başarısını artırmak için GA içeren bir model önerilmiştir. Önerilen yöntemin performansını değerlendirmek için çalışmada en çok bilinen ve rahatlıkla ulaşılabilen 5 tıbbi veri kümesi ve 7 farklı denetimli sınıflandırma yöntemi kullanılmıştır. Her veri kümesi ile her sınıflandırıcı için ayrı ayrı özellik seçimi ve sınıflandırma uygulamaları yapılmıştır. Bu uygulamalarda elde edilen sonuçlar, önerilen yaklaşımla yapılan sınıflandırmalarda, veri kümesine bağlı olarak, Doğruluk oranında dolayısıyla makine öğrenmesi modeli performansında ortalama %2 ile %21 arasında artış sağlandığını ortaya koymuştur. Ayrıca yapılan çalışmalarda denetimli sınıflandırma algoritmalarından Rastgele Ormanın bütün veri kümelerinde diğer algoritmalardan daha iyi sonuçlar verdiği görülmekte ve tıbbi veri kümelerindeki sınıflandırma başarısı ile öne çıktığı görülmüştür.

FEATURE SELECTION WITH GENETIC ALGORITHMS AND ITS EFFECT ON CLASSIFICATION PERFORMANCE IN MEDICAL DATASETS

Nowadays, it has become very difficult for clinical decision support systems to obtain useful information from very large medical databases. Genetic algorithms (GA) are a widely used feature selection method and can give the best solutions. In this study, a model with GA is proposed to select features from medical data with a large number of complex data and to increase classification success by creating the most appropriate feature subset. In order to evaluate the performance of the proposed method, 5 most well-known and easily accessible medical data sets and 7 different supervised classification methods were used in the study. Feature selection and classification applications were made separately for each data set and each classifier. The results obtained in these applications revealed that, depending on the data set, in the classifications made with the proposed approach, an average of 2% to 21% increase was achieved in the accuracy rate and thus in the machine learning model performance. In addition, it is seen that the Random Forest, one of the supervised classification algorithms, gives better results in all data sets than other algorithms and it has been seen that it stands out with its classification success in medical datasets.

___

  • Aalaei, S., Shahraki, H., Rowhanimanesh, A., Eslami, S., 2016. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iranian journal of basic medical sciences, 19(5), 476.
  • Abdollahi, J., Nouri-Moghaddam, B., 2021. Feature selection for medical diagnosis: Evaluation for using a hybrid Stacked-Genetic approach in the diagnosis of heart disease. arXiv preprint arXiv:2103.08175. Booker, L. B., Goldberg, D. E., Holland, J. H., 1989. Classifier systems and genetic algorithms. Artificial intelligence, 40(1-3), 235-282.
  • Tutorials Point, 2016. Artificial Intelligence and Python, www.tutorialspoint.com.
  • Ba-Alwi, F. M., Hintaya. H. M., 2013. Comparative study for analysis the prognostic in hepatitis data: data mining approach. Spinal Cord, 11(12).
  • Chen, C. W., Tsai, Y. H., Chang, F. R., Lin, W. C., 2020. Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), e12553.
  • Chiesa, M., Maioli, G., Colombo, G. I., & Piacentini, L. 2020. GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets. BMC bioinformatics, 21(1), 1-11.
  • Deperlioglu, O., 2019. Classification of segmented phonocardiograms by convolutional neural networks. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 10(2), 5-13.
  • Ershadi, M. M., & Seifi, A. 2022. Applications of dynamic feature selection and clustering methods to medical diagnosis. Applied Soft Computing, 126, 109293.
  • Goldberg, D. E. 1989. Genetic algorithms in search. Optimization, and machine learning.
  • Habib, M., Aljarah, I., Faris, H., & Mirjalili, S. 2020. Multi-objective particle swarm optimization: theory, literature review, and application in feature selection for medical diagnosis. Evolutionary machine learning techniques, 175-201.
  • Jaganathan, P., Kuppuchamy, R., 2013. A threshold fuzzy entropy-based feature selection for medical database classification. Computers in Biology and Medicine, 43, 2222–2229.
  • Jothi, N., Husain, W., Rashid, N. A., Syed-Mohamad, S., 2019. Feature Selection Method using Genetic Algorithm for Medical Dataset. International Journal on Advanced Science Engineering Information Technology, 9(6), 1907-1912.
  • Kankanhalli, A., Hahn, J., Tan, S., Gao, G., 2016. Big data and analytics in healthcare: Introduction to the special section. Information Systems Frontiers, 18(2), 233-235.
  • Khadir, D.A., Amanullah, K.M. 2017. An Implementation of genetic algorithm-based feature selection approach over medical datasets.
  • Kumar, C. S., Thangaraju, P. 2021. Optimal Feature Subset Selection Method for Improving Classification Accuracy of Medical Datasets. Annals of the Romanian Society for Cell Biology, 3892-3913.
  • Little, M. A., McSharry, P. E., Hunter, E. J., Ramig L.O., 2008. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE Transactions on Biomedical Engineering.
  • Naheed, N., Shaheen, M., Khan, S. A., Alawairdhi, M., & Khan, M. A., 2020. Importance of features selection, attributes selection, challenges and future directions for medical imaging data: a review. Computer Modeling in Engineering & Sciences, 125(1), 314-344.
  • Nadimi-Shahraki, M. H., Banaie-Dezfouli, M., Zamani, H., Taghian, S., & Mirjalili, S. 2021. B-MFO: a binary moth-flame optimization for feature selection from medical datasets. Computers, 10(11), 136.
  • Machine Learning Notes, Jawaharlal Nehru Technological University, Kakinada, https://www.studocu.com/in/document/ jawaharlal-nehru-technological-university-kakinada/computer-science-engineering/machine-learning-notes/17339474. Son erişim: 16.05.2022.
  • Mwadulo, M. W., 2016. A review on feature selection methods for classification tasks.
  • Parthiban, R., Usharani, S., Saravanan, D., Jayakumar, D., Palani, D. U., StalinDavid, D. D., & Raghuraman, D. (2021). Prognosis of chronic kidney disease (CKD) using hybrid filter wrapper embedded feature selection method. European Journal of Molecular & Clinical Medicine, 7(9), 2511-2530.
  • Rostami, M., Forouzandeh, S., Berahmand, K., & Soltani, M. (2020). Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics, 112(6), 4370-4384.
  • Rostami, M., Berahmand, K., Forouzandeh, S., 2021. A novel community detection based genetic algorithm for feature selection. Journal of Big Data, 8(1), 1-27.
  • Salama, G. I., Abdelhalim, M., Abd-elghany Zeid. M., 2012. Breast cancer diagnosis on three different datasets using multi-classifiers." Breast Cancer (WDBC) 32(569): 2.
  • Samant, R., Rao, S., 2013. A study on Feature Selection Methods in Medical Decision Support Systems. International Journal of Engineering Research & Technology (IJERT). 2(11), 615-619.
  • Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for classification tasks. Information processing & management, 45, 4, 427-437.
  • Tuba, E., Strumberger, I., Bezdan, T., Bacanin, N., Tuba, M., 2019. Classification and feature selection method for medical datasets by brain storm optimization algorithm and support vector machine. Procedia Computer Science, 162, 307-315.
  • UCI Machine Learning Repository, 2007, https://archive.ics.uci.edu/ml/index.php, Irvine, CA: University of California, School of Information and Computer Science, Son erişim15 Mayıs 2022.
  • Wang, Y., Ma, L. (2009, January). Feature selection for medical dataset using rough set theory. In WSEAS International Conference. Proceedings. Mathematics and Computers in Science and Engineering (No. 3). World Scientific and Engineering Academy and Society.
  • Wirsansky, E., 2020. Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd.
  • Yeniterzi, S., Yeniterzi, R., Kücükural, A., & Sezerman, U., 2007. Feature selection with genetic algorithms on cardiac arrhythmia database. In the 2nd International Symposium on Health Informatics and Bioinformatics (HIBIT).