Utku KOÇ, Türkan SEVGİLİ

Consumer loans’ first payment default detection: a predictive model

A default loan (also called nonperforming loan) occurs when there is a failure to meet bank conditions andrepayment cannot be made in accordance with the terms of the loan which has reached its maturity. In this study,we provide a predictive analysis of the consumer behavior concerning a loan’s first payment default (FPD) using a realdataset of consumer loans with approximately 600,000 records from a bank. We use logistic regression, naive Bayes,support vector machine, and random forest on oversampled and undersampled data to build eight different models topredict FPD loans. A two-class random forest using undersampling yielded more than 86% on all performance measures:accuracy, precision, recall, and F1-score. The corresponding scores are even as high as 96% for oversampling. However,when tested on the real and balanced dataset, the performance of oversampling deteriorates as generating syntheticdata for an extremely imbalanced dataset harms the training procedure of the algorithms. The study also provides anunderstanding of the reasons for nonperforming loans and helps to manage credit risks more consciously.

PDF

___

[1] Thomas LC. A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecasting 2000; 2: 149-172.
[2] Menard SW. Applied Logistic Regression Analysis. 2nd ed. California, CA, USA: Sage Publications, 2002.
[3] King G, Zeng L. Logistic Regression in Rare Events Data. New York, NY, USA: Academic Press, 2001.
[4] Khandani AE, Kim AJ, Lo AW. Consumer credit-risk models via machine-learning algorithms. J Bank & Fin 2010; 34: 2767-2787.
[5] Butaru F, Chen Q, Clark B, Das S, Lo AW, Siddique A. Risk and risk management in the credit card industry. J Bank & Fin 2016; 72: 218-239.
[6] Fitzpatrick T, Mues C. An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market. European J Operat Res 2016; 249: 427-439.
[7] Addo PM, Guegan D, Hassani, B. Credit risk analysis using machine and deep learning models. Risks 2018; 6: 38-57.
[8] Tang L, Cai F, Quayang Y. Applying a nonparametric random forest algorithm to assess the credit risk of the energy industry in China. Tech Forecasting & Soc Change ; In Press.
[9] Tsai MC, Lin SP, Cheng CC, Lin, YP. The consumer loan default predicting model–an application of dea–da and neural network. Expert Sys w Appl 2009; 36: 11682-11690.
[10] BRSA Turkish Banking Regulation and Supervision Agency. Turkish Banking Sector Main Indicators. Ankara, Turkey, 2017.
[11] Turkish Statistical Institute. Labor Force Statistics 2018.
[12] Turkish Statistical Institute. Statistics Database.
[13] Bagherpour A. Predicting Mortgage Loan Default with Machine Learning Methods. Riverside, CA, USA: Academic Press, 2017.
[14] Shelke MMS, Deshmukh PR, Shandilya VK. A review on imbalanced data handling using undersampling and oversampling technique. Int J Recent Trends in Eng & Res 2017; 3: 444-449.
[15] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: Synthetic minority oversampling technique. J Art Int Res 2002; 16: 321-357.
[16] Luengo J, Fernández, A, García S, Herrera F. Addressing data complexity for imbalanced datasets: analysis of smote-based oversampling and evolutionary undersampling. Soft Computing 2011; 15: 1909-1936.
[17] Taft LM, Evans RS, Shyu CR, Egger MJ, Chawla N, Mitchell JA, Thornton SN, Bray B, Varner M. Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. J of Bio Inf 2019; 42: 356-364.