AN EXPERIMENTAL COMPARISON OF TRADITIONAL AND MACHINE LEARNING METHODS PREDICTION PERFORMANCES: A STUDY ON HEALTH OUTCOMES

Machine learning techniques can identify the non-linear patterns in a dataset and can uncover hidden relationships. Random forest is one of the modern machine learning techniques that provides an alternative to traditional classification methods such as logistic regression. In this study it is aimed to compare the prediction performance of logistic regression with that of random forest and to identify the predicting factors of public health outcomes at a provincial level. The data representing 81 provinces of Turkey are taken from the Turkish Statistical Institute for the year 2013. Life expectancy at birth and mortality are chosen as the public health outcomes. Three different random forest models are constructed by determining the number of trees: 50, 100, and 150. The prediction results of different methods are recorded by changing the “k” parameter from 3 to 20 in k-fold cross validation. The Area Under the ROC Curve (AUC), sensitivity, and specificity are considered as performance measures. The study results reveal that the differences between the prediction model performances to predict health outcomes are statistically significant (p

___

  • Acemoglu, D., & Johnson, S. (2006). Disease and development: the effect of life expectancy on economic growth. NBER Working Paper Series, Working Paper, No. 12269. http://www.nber.org/papers/w12269. (25.11.2017).
  • Atun, R. (2015). Transforming turkey’s health system-lessons for universal coverage. The New England Journal of Medicine, 373(14), 1285-1289. Baser, O., Burkan, A., Baser, E., Koselerli, R., Ertugay, E., & Altinbas. A. (2013). High cost patients for cardiac surgery and hospital quality in turkey. Health Policy, 109(2), 143-149.
  • Begueria, S., & Lorente, A. (2002). Landslide hazard mapping by multivariate statistics: comparison of methods and case study in the Spanish pyrenees. Technical report, Instituto Pirenaico de Ecologia, Zaragoza, Spain.
  • Berkman, N. D., Sheridan, S. L., Donahue, K. E., Halpern, D. J., & Crotty, K. (2011). Low health literacy and health outcomes: an updated systematic review. Annals of Internal Medicine, 155(2): 97-107.
  • Breiman, L. (2001). Statistical modeling: the two cultures. Statistical Science, 16(3): 199-231.
  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees, Chapman and Hall/CRC, Taylor and Francis Group, Boca Raton.
  • Camdeviren, H., Yazici, A.C., Akkus, Z., Bugdayci, R., & Sungur, M. A. (2007). Comparison of logistic regression model and classification tree: an application to postpartum depression data. Expert Systems with Applications, 32(4), 987-994.
  • Celik, Y., & Hotchkiss, D. R. (2000). The socio-economic determinants of maternal health care utilization in Turkey. Social Science and Medicine, 50(12), 1797-1806.
  • Cilingiroglu, N., & Yardim, M. S. (2014). Approaching socioeconomic inequalities in Turkey by using self-assessed health. European Journal of Public Health, 24(2), 25-26.
  • Couronne, R., Probst, P., & Boulesteix, A. L. (2017). Random forest versus logistic regression: a large scale benchmark experiment. Technical Report Number 205, University of Munich, Department of Statistics, http://www.stat.uni-muenchen.de, (29.5.2018).
  • Crémieux, P. Y., Ouellette, P., & Pilon, C. (1999). Health care spending as determinants of health outcomes. Health Economics, 8(7), 627-639.
  • Crisp, B. R., Swerissen, H., & Duckett, S. J. (2000). Four approaches to capacity building in health: consequences for measurement and accountability. Health Promotion International, 15(2), 99-107.
  • Fenton, J. J., Jerant, A. F., Bertakis, K. D., & Franks, P. (2012). The cost of satisfaction a national study of patient satisfaction, health care utilization, expenditures and mortality. Archives of Internal Medicine, 172(5), 405-411.
  • Gani, A. (2009). Health care financing and health outcomes in pacific island countries. Health Policy and Planning, 24(1), 72-81.
  • Gilligan, A. M., & Skrepnek, G. H. (2015). Determinants of life expectancy in the eastern mediterranean region. Health Policy and Planning, 30(5), 624-637.
  • Grömping, U. (2009). Variable importance assessment in regression: linear regression versus random forest. The American Statistician, 63(4), 308-319.
  • Halicioglu, F. (2011). Modeling life expectancy in Turkey. Economic Modelling, 28(5), 2075-2082.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data Mining, Inference and Prediction. (2nd ed.). Springer.
  • Hegelich, S. (2016). Decision trees and random forests: machine learning techniques to classify rare events. European Policy Analysis, 2(1), 98-120.
  • Hitiris, T., & Posnett, J. (1992). The determinants and effects of health expenditure in developed countries. Journal of Health Economics, 11(2), 173-181.
  • Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(51), 1-13.
  • Kilic, B., Kalaca, S., Unal, B., Phillimore, P., & Zaman, S. (2015). Health policy analysis for prevention and control of cardiovascular diseases in diabetes mellitus in Turkey. International Journal of Public Health, 60(1), 47-53.
  • Kurt, I., Ture, M., & Kurum, T. A. (2008). Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications, 34(1), 366-374.
  • Kyriopoulos, I. I., Zavras, D., Skroumpelos, A., Mylona, K., Athanasakis, K., & Kyriopoulos, J. (2014). Barriers in access to healthcare services for chronic patients in time of austerity: an empirical approach in Greece. International Journal for Equity in Health, 13(54), 1-7.
  • Lee, R. (2019). Mortality forecasts and linear life expectancy trends. In: Bengtsson T., Keilman N. (eds) Old and New Perspectives on Mortality Forecasting. Demographic Research Monographs (A Series of the Max Planck Institute for Demographic Research). Springer, Cham.
  • Lehr, S., Liu, H., Klinglesmit, S., Konyha, A., Robaszewska, N., & Medinilla, J. (2016). Use educational data mining to predict undergraduate retention. IEEE 16th International Conference on Advanced Learning Technologies, Austin, TX, USA. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=andarnumber=7757015, (28.5.2018).
  • Li, X. B., Sweigart, J., Teng, J., Donohue, J., & Thombs, L. (2001). A dynamic programming based pruning method for decision trees. INFORMS Journal on Computing, 13(4), 332-344.
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2/3, S18-S22.
  • Lichtenberg, F. R., Tatar, M., & Caliskan, Z. (2014). The effect of pharmaceutical innovation on longevity, hospitalization and medical expenditure in Turkey. Health Policy, 117(3), 361-373.
  • Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., & Mendonça, A. (2011). Data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Research Notes, 4(299), 1-14.
  • Muchlinksi, D., Siroky, D., Jingrui, H., & Kocher, M. (2016). Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis, 24(1), 87-103.
  • Ng, A. Y., & Jordan, M. I. (2002). On discriminative vs. Generative classifiers: a comparison of logistic regression and naive bayes, Advances in Neural Information Processing Systems 14 (NIPS 2001), Vancouver, British Columbia, Canada. https://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf, (28.5.2018).
  • Omran, A. R., & Roudi, F. (1993). The middle east population puzzle. Population Bulletin, 48(1), 1-40. https://www.ncbi.nlm.nih.gov/pubmed/12318382. Organization for Economic Cooperation and Development. (OECD) (2016). Better life index. http://www.oecdbetterlifeindex.org/countries/turkey/. (07.06.2016).
  • Pereira, C., Murphy K., & Herndon, D. (2004). Outcome measures in burn care. Is mortality dead? Burns, 30(8), 761-771. Republic of Turkey Ministry of Health (MOH). (2017). Health statistics year book-2017. https://dosyasb.saglik.gov.tr/Eklenti/30148,ingilizcesiydijiv1pdf.pdf?0. (10.9.2019).
  • Rosset, S., Perlich, C., Swirszcz, G., Melville, P., & Liu, Y. (2010). Medical data mining: insights from winning two competitions, Data Mining and Knowledge Discovery, 20(3), 439-468.
  • Samant, P., & Agarwal, R. (2018). Machine learning techniques for medical diagnosis for diabetes using iris images. Computer Methods and Programs in Biomedicine, 157, 121-128. Siroky, D. S. (2009). Navigating random forests and related advances in algorithmic modeling. Statistics Surveys, 3, 147-163. An Experimental Comparison of Traditional and Machine Learning Methods Prediction Performances 39
  • Sozmen, K., Unal, B., Capewell, S., Critchley, J., & O'Flaherty, M. (2015). Estimating diabetes prevalence in turkey in 2025 with and without possible interventions to reduce obesity and smoking prevalence, using a modelling approach. International Journal of Public Health, 60(1), 13-21.
  • Sut, N., & Simsek, O. (2011). Comparison of regression tree data mining methods for prediction of mortality in head injury. Expert Systems with Applications, 38(12), 15534-15539.
  • Trigila, A., Iadanza, C., Esposito, C., & Scarascia-Mugnozza, G. (2015). Comparison of logistic regression and random forest techniques for shallow landslide susceptibility assessment in giampilieri (NE Sicily, Italy). Geomorphology, 249(15), 119-136.
  • Turkish Statistical Institute (TurkStat). http://www.turkstat.gov.tr/UstMenu.do?metod=istgosterge. (07.06.2018).
  • Van den Eeckhaut, V. D., Vanwalleghem, M. T., Poesen, J., Govers, G., Verstraeten, G., & Vandekerckhove, L. (2006). Prediction of landslide susceptibility using rare events logistic regression: a case-study in the flemish ardennes (Belgium). Geomorphology, 76(3-4), 392–410.
  • Wagstaff, A. (2000). Research on equity, poverty and health outcomes: lessons for the developing world, HNP Discussion Paper, 28908. http://siteresources.worldbank.org/HEALTHNUTRITIONANDPOPULATION/Resources/281627-1095698140167/Wagstaff-ResearchOn-whole.pdf. (10.10.2017).
  • Wright, J., & Walley, J. (1998). Assessing health needs in developing countries. British Medical Journal, 316(7147), 1819-1823.
  • Zhanga, G., Hu, M. Y., Patuwob, B. E., & Indrob, D. C. (1999). Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis, European Journal of Operational Research, 116(1), 16-32.
  • Zhao, L., Chen, Y., & Schaffner, D. W. (2001). Comparison of logistic regression and linear regression in modeling percentage data. Applied and Environmental Microbiology, 67(5), 2129-2135.
Hacettepe Sağlık İdaresi Dergisi-Cover
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2015
  • Yayıncı: Hacettepe Üniversitesi İktisadi ve İdari Bilimler Fakültesi