Sınıflandırma Algoritmalarının Performanslarının Karşılaştırılması için Özellik Seçim Yöntemleri Üzerine Bir Uygulama

Bu çalışmada ele alınan bir verinde yer alan çok sayıdaki değişken arasından özellik seçim yöntemleri yardımı ile daha az sayıda ve anlamlı değişkenlerin belirlenmesi amaçlanmıştır. Özellik seçim yöntemleri son yıllarda istatistik bilimi içerisinde büyük önem arz eden etkili ve araştırmacılara büyük kolaylıklar sağlayan yöntemlerdir. Yöntem içerisinde kullanılan tekniğe bağlı olarak farklı sayıda değişkenlerin modele alınmasına sebep olmakla beraber doğru sınıflandırma oranları değişebilmektedir. Bu bağlamda ilgilenilen çok dayıda değişkene sahip bir veri seti içerisindeki değişkenlerin yüksek bir sınıflama yüzdesi ile daha az sayıda yeni değişkenle ifade edilebilmesi zaman, maliyet gibi konularda olumlu katkılar sunmaktadır. Bu çalışmada ele alınan veri setinde yer alan değişkenler öncelikle farklı özellik seçim yöntemleri ile analiz edilerek yeni veri setleri oluşturulmuştur. Daha sonra oluşturulan bu yeni ve farklı sayıda değişken içeren ver setleri, farklı makine öğrenme teknikleri ile analiz edilerek en iyi makine öğrenme tekniği belirlenmiştir. Bu çalışma kronik böbrek hastalığı verileri ele alınarak farklı özellik seçimleri yöntemleri ile veri setinde yer alan değişkenler sınıflandırılmıştır. Çalışma sonuçları incelendiğinde en yüksek sınıflandırma oranı %99.75 ile rassal orman ve çok katmanlı algılayıcı tekniğini içeren korelasyon tabanlı özellik seçimi yönteminden ve yine aynı oran ile k en yakın komşu tekniğini içeren filtre yönteminden elde edilmiştir. Çalışma sonuçları daha önceden aynı veri seti kullanılarak yapılan diğer araştırmalarla karşılaştırıldığında, bu çalışmadan elde edilen doğru sınıflama yüzdesinin diğer çalışmalardan daha yüksek olduğunu göstermektedir.

Anahtar Kelimeler:

Sınıflandırma yöntemleri, Özellik seçim yöntemi, Makine öğrenmesi, Kronik böbrek hastalığı

An Application of Feature Selection Methods to Compare the Performances of Classification Algorithms

In this study, it is aimed to determine fewer and significant variables with the help of feature selection methods among a large number of variables in the data discussed. Feature selection methods are effective methods that have great importance in statistics in recent years and provide great convenience to researchers. Depending on the technique used in the method, different numbers of variables are included in the model, but the correct classification rates may vary. In this context, being able to express the variables in a data set with a large number of variables of interest with a high classification percentage and fewer new variables makes positive contributions to issues such as time and cost. The variables in the data set discussed in this study were firstly analyzed with different feature selection methods and new data sets were created. Afterwards, these new data sets containing different numbers of variables were analyzed with different machine learning techniques and the best machine learning technique was determined. In this study, chronic kidney disease data were handled and the variables in the data set were classified with different feature selection methods. When the results of the study are examined, the highest classification rate with 99.75% was obtained from the correlation-based feature selection method, which includes the random forest and multilayer perceptron technique, and the filter method, which includes the k-nearest neighbor technique, with the same rate. The results of the study show that the percentage of correct classification obtained from this study is higher than that of other studies, when compared with other studies using the same dataset.

Keywords:

Clustering methods, Feature selection methods, Machine learning, Chronic kidney disease,

PDF

___

Akpolat, T. and Utaş, C., 2008. Hemodiyaliz Hekimi El Kitabı, Türk Nefroloji Derneği Yayınları, Ceylan Ofset Samsun, 56-72.
Ay, Ö., 2019. Özellik Seçimi Problemleri İçin Polihedral Konik Fonksiyonlar Temelli Çözüm Yaklaşımı. Yüksek Lisans Tezi, Eskişehir Teknik Üniversitesi Endüstri Mühendisliği Anabilim Dalı, Eskişehir, 43.
Aydın, C., 2018. Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 14, 169–175.
Beyazıt, B.E., 2019. Büyük Veri Problemlerinde Performans Arttırmaya Yönelik Özellik Seçimi ve Boyut İndirgeme Optimizasyonu. Yüksek Lisans Tezi, Gazi Üniversitesi Bilişim Enstitüsü, Ankara, 63.
Charleonnan, P., Fufaung, T., Niyomwong, T., Chokchueypattanakit, W., Suwannawach, S., et al., 2016. Predictive analytics for chronic kidney disease using machine learning techniques, The 2016 Management and Innovation Technology International Conference, Bangkok, Thailand, 81–83.
Chetty, N., Vaisla, K.S. and Sudarsan, S.D., 2015. Role of attributes selection in classification of chronic kidney disease patients, International Conference on Computing, Communication and Security (ICCCS), Le Meridien, Mauritius, 1–6.
Congalton, R.G. and Green, K., 1998, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, first edn. Lewis Publications, Boca Raton, 137.
Çelik, E., Atalay, M. and Kondiloğlu, A., 2016. The diagnosis and estimate chronic kidney disease using the machine learning methods. International Journal of Intelligent Systems and Applications in Engineering. 4, 27–31.
Emel, G.G. and Taşkın, Ç., 2005. Veri Madenciliğinde Karar Ağaçları ve Bir Satış Analizi Uygulaması. Eskişehir Osmangazi Üniversitesi, Sosyal Bilimler Dergisi, 6, 221–229.
Forman, G., 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289–1305.
Gazeloğlu, C., 2020. Prediction of heart disease by classifying with feature selection and machine learning methods. Progress in Nutrition 22, 660-670.
Gunarathne, W.H.S.D., Perera, K.D.M. and Kahandawaarachchi, K.A.D.C.P., 2017. Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD). IEEE 17th International Conference on Bioinformatics and Bioengineering, 291–296p.
Karakaş, M., 2020. Sınıflandırma Problemlerinde Özellik Seçimi için Karşıtlık Tabanlı Gri Kurt Optimizasyon Algoritması. Yüksek Lisans Tezi, Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Enstitüsü, Bilecik, 66.
Kılıç, S., 2015. Kappa Test. Journal of Mood Disorders, 5, 142–144.
Koç, İ., 2016. Sınıflandırma Problemlerinde Meta-Sezgisel Optimizasyon Yöntemlerinin Özellik Seçimi ve Ayrıklaştırma Amacıyla Kullanımı. Yüksek Lisans Tezi, Selçuk Üniversitesi Fen Bilimleri Enstitüsü, Konya, 181.
Landis, J.R. and Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
Rahman, M.M., Usman, O.L., Muniyandi, R.C., Sahran, S., Mohomed, S. and Razak, R.A., 2020. Otizm Spektrum Bozukluğu için Özellik Seçim ve Sınıflandırmasına Yönelik Makine Öğrenim Yöntemlerinin Gözden Geçirilmesi. Brain Sciences Journal, 10, 1–26.
Takıcı, H., 2018. Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering Computer Sciences, 26, 1–10.
Viera, A.J. and Garrett, J.M., 2005. Understanding interobserver agreement: the kappa statistic. Family Medicine, 3, 360–370.
https://tr.wikipedia.org/wiki/Fleiss%27in_kappa_katsayısı/, (22.06.2021).
https://www.kisa.link/KU44/, (12.02.2021)
https://blog.goldenhelix.com/cross-validation-for-genomic-prediction-in-svs/, (11.03.2021).
https://www.openml.org/a/estimation-procedures/7, (11.03.2021).