DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ

Amaç- Kredi skorlama modeli geliştirilirken kullanılan veri kümelerinde sınıflara ait örneklerin dengesiz bir dağılıma sahip olmalarından dolayı, modellerin doğruluk oranı düşük olmaktadır. Biz bu çalışmada kollektif öğrenme algoritmalarını maliyete duyarlı öğrenme yöntemiyle birlikte kullanarak elde edilen modellerin performansını karşılaştıp en etkin modellleri belirlemeye çalıştık. Metodoloji- Bu amaçla Bagging ve AdaBoost kolektif öğrenme yöntemleri karar ağaçları, destek vektör makineleri ve k-NN temel sınıflandırıcıları ile iki farklı kredi veri seti üzerinde çalıştırılmıştır. Ayrıca Bagging ve AdaBoost için maliyet duyarlı öğrenme yöntemi kullanılarak azınlık sınıflandırma grubunun ceza puanı artırılmıştır. Bütün bu kombinasyonlar kıyaslanmıştır.Bulgular- Maliyete duyarlı öğrenme yöntemlerinin kullanılması, hem AdaBoost hem de Bagging için performans değerlendirme ölçeği AUC açısından daha başarılı sonuçlar elde edilmesini sağlamıştır. Verideki sınıf dengesizlik oranının artması durumunda, karar ağaçlarının temel sınıflandırıcı olduğu Bagging kolektif yönteminin AdaBoost kolektif yöntemine göre daha yüksek başarı elde ettiği gözlemlenmiştir. Sonuç- Başarısı yüksek etkili bir kredi skorlama yöntemi geliştirilmesi hala çözülmesi gereken bir problem olmasına rağmen kolektif öğrenme yöntemi ile oluşturulan modellerin bireysel sınıflandırıcılarılarla oluşturulan modellere göre daha yüksek başarı gösterdiği gözlemlenmiştir. Bu durum literatürdeki diğer çalışma bulgularıyla da örtüşmektedir. [Maciej Zięba ve ark., 2012]

Anahtar Kelimeler:

Kredi skorlama, kolektif öğrenme, dengesiz veri seti

PERFORMANCE EVALUATION OF ENSEMBLE LEARNING ALGORITHMS ON UNBALANCED CREDIT SCORING DATA SETS

Purpose- As the credit scoring model is developed, the accuracy of the models is low due to the unbalanced distribution of the samples belonging to the classes. In this study, we tried to determine the most effective models by comparing the performance of the models obtained by using collective learning algorithms together with cost sensitive learning method.Methodology- For this purpose, Bagging and AdaBoost collective learning methods were run on two different credit data sets with decision trees, support vector machines and k-NN basic classifiers. In addition, the penal score of the minority classification group was increased by using cost-sensitive learning method for Bagging and AdaBoost. All these combinations were compared.Findings- The use of cost-sensitive learning methods has led to more successful results in terms of AUC for both AdaBoost and Bagging. It was observed that the Bagging collective method, which is the main classifier of decision trees, had higher success than the AdaBoost collective method in the case of increasing class imbalance rate in the data. Conclusion- Although the development of a highly effective credit scoring method is still a problem that needs to be solved, it has been observed that the models created by the collective learning method show higher success than the models created by individual classifiers. This situation coincides with the findings of other studies in the literature. [Maciej Zięba ve ark., 2012]

Keywords:

Credit scoring, ensemble learning, unbalance datasets,

PDF

___

Paleologo, Giuseppe ve ark. (2010). Subagging for credit scoring models. European Journal of Operational Research. 201 (2010) 490–499
Wang, Gang ve ark. (2011). A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications. 38 (2011) 223 - 230
Marqués, A.I. ve ark. (2012). Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Systems with Applications. 39 (2012) 10244–10250
Gang Wang ve ark (2012). A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine. Expert Systems with Applications. 39 (2012) 5325–5331
Jin Xiao ve ark (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications 39 3668–3675
Maciej Zięba ve ark (2012). Ensemble Classifier for Solving Credit Scoring Problems. IFIP International Federation for Information Processing, pp. 59–66
Chih-Fong Tsai ve ark (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24 (2014) 977–984
Tsai Chih-Fong ve ark (2014). Modeling credit scoring using neural network ensembles. Kybernetes, vol. 43, no. 7, pp. 1114-1123
Ning Chen ve ark (2015). Comparative study of classifier ensembles for cost-sensitive credit risk assessment. Intelligent Data Analysis 19 (2015) 127–144
Stefan Lessmann ve ark (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247(2015) 124–136
Shashi Dahiya ve ark (2015). Credit Scoring Using Ensemble of Various Classifiers on Reduced Feature Set. Industrija, vol. 43, no. 4
Hong Wang ve ark (2015). Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble. PLOS ONE
C.R.Durga devi ve ark (2016). A Relative Evaluation of the Performance of Ensemble Learning in Credit Scoring. IEEE International Conference on Advances in Computer Applications (ICACA)
Uma R. Salunkhe, Suresh N. Mali (2016). Classifier Ensemble Design for Imbalanced Data Classification A Hybrid Approach. International Conference on Computational Modeling and Security (CMS 2016) Procedia Computer Science 85 ( 2016 ) 725 – 732
You Zhu ve ark (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput & Applic (2017) 28 (Suppl 1):S41–S50
Mikel Galar ve ark (2012). A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, vol. 42, no. 4, pp. 463-485
Charles X. Ling, Victor S. Sheng (2008). Cost-Sensitive Learning and the Class Imbalance Problem. Encyclopedia of Machine Learning
Zhou , Zhi-Hua (2012). Ensemble Methods Foundations and Algorithms . Cambridge UK: CRC Press
ETS Asset Management Factory. (2016, Nisan 20). What is the difference between Bagging and Boosting?. Retrieved from https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
Schapire, R. (2013, Ekim 09). Explaining AdaBoost. Retrieved from http://rob.schapire.net/papers/explaining-adaboost.pdf
Polikar, R. (2008, Aralık 22). Ensemble Learning. Retrieved from http://www.scholarpedia.org/article/Ensemble_learning
Lutins, E. (2017, Ağustos 01). Ensemble Methods in Machine Learning: What are They and Why Use Them?. Retrieved from https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f