Elanur TÜRKÜZ, Ebru ÇAĞLAYAN

Ekonometri ve Makine Öğrenmesi: Tercih Modelleri ve Sınıflandırma Algoritmaları Açısından Değerlendirmeler

Ekonometri ve makine öğrenmesi geniş kullanım alanlarına ve tekniklere sahiptir. Bu çalışmada ekonometride bağımlı değişkenin nitel özellik gösterdiği durumda kullanılan nitel tercih modelleri ile makine öğrenmesinde kullanılan sınıflandırma algoritmalarına yer verilmiş olup, bu doğrultuda ekonometri ile makine öğrenmesi arasında nasıl bir köprü kurulabileceğinin araştırılması amaçlanmıştır. Büyük verilerin ekonometride yarattığı sorunlar ve makine öğrenmesinin yapabileceği katkılar araştırılmış ve kestirim tabanlı sınıflandırma algoritmalarının çekimser kaldığı nedensellik araştırmalarındaki konumu incelenerek ekonometrinin sağlayabileceği katkılar ortaya konulmuştur.

Anahtar Kelimeler:

Ekonometri, Makine Öğrenmesi, Nitel Tercih Modelleri, Sınıflandırma Algoritmaları, Yüksek Boyutluluk

PDF

___

Abrera, J. B. (1974). Traditional Classification: Characteristics, Uses and Problems. In Painter, A. F. (Ed.), Classification: Theory and Practice (pp. 21-36). Philadelphia: Drexel University Press.
Aldirch, J. (1995). Correlations Genuine and Spurious in Pearson and Yule. Statistical Science, 10(4), 364-376.
Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired. Retrieved Mayıs 23, 2020, from https://www.wired.com/2008/06/pb-theory/
Aşkın, E. Ö. (2019). Karar Ağaçları. In Alp, S., & Öz, E. (Eds.) Makine Öğrenmesinde Sınıflandırma Yöntemleri ve R Uygulamaları (pp. 1-35). Ankara: Nobel Akademik Yayıncılık.
Athey, S. (2017). Beyond Prediction: Using Big Data for Policy Problems. Science, 355 (6324), 483-485. doi:10.1126/science.aal4321
Athey, S. (2019). The Impact of Machine Learning on Economics. In Agrawal, A., Gans, J., & Goldfarb, A. (Eds.). The Economics of Artificial Intelligence: A Review (pp.507-547). Chicago: University of Chicago Press.
Athey, S., & Imbens, G. W. (2017). The State of Applied Econometrics: Causality and Policy Evaluation. Journal of Economic Perspectives, 31(2), 3-32. doi: 10.1257/jep.31.2.3.
Athey, S., & Imbens, G. W. (2019). Machine Learning Methods Economists Should Know About. Annual Review of Economics, 11, 685-725. doi:10.1146/annurev-economics-080217-053433
Athey, S., Tibshirani, R., & Wager, S. (2018). Generalized Random Forests. arXiv.org>stat>arXiv: 1610.01271v4. https://arxiv.org/abs/1610.01271.
Baldwin, J. T. (1985). Classification Theory: 1985. In Baldwin, J. T. (Ed.) Classification Theory. Berlin: Springer.
Ben-Akiva, M. E., & Lerman, S. R. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand (1. Basım). Londra: MIT Press.
Belloni, A., & Chernozhukov, V. (2013). Least Squares after Model Selection in High Dimensional Sparse Models. Bernoulli, 19(2), 521-547. doi:10.3150/11-BEJ410.
Belloni, A., Chernozhukov, V., & Wei, Y. (2013). Honest Confidence Regions for Logistic Regression with a Large Number of Controls. arXix.org>stat>arXiv: 1304.3969v1. https://arxiv.org/abs/1304.3969v1.
Belloni, A., Chernozhukov, V., & Hansen, C. (2014). High-Dimensional Methods and Inference on Structural and Treatment Effects. Journal of Economic Perspectives, 28(2), 29-50. doi:10.1257/jep.28.2.29
Belloni, A., Chernozhukov, V., Fernandez-Val, I., & Hansen, C. (2018). Program Evaluation and Causal Inference with High Dimensional Data. arXiv.org>math>arXiv:1311.2645. https://arxiv.org/abs/1311.2645.
Berk, R., Brown, L., & Zhao, L. (2010). Statistical Inference After Model Selection. Journal of Quantitative Criminology, 26(2), 217-236. doi: 10.1007/s10940-009-9077-7
Bernheim, D., Björkegren, D., Naecker, J., & Rangel, A. (2013). Non-Choice Evaluations Predict Behavioral Responses to Changes in Economic Conditions (NBER Working Paper 19269). Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w19269
Björkegren, D., & Grissen, D. (2017). Behavior Revealed in Mobile Phone Usage Predicts Loan Repayment. arXiv.org>cs>arXiv:1712.05840v1. https://128.84.21.199/abs/1712.05840v1
Brathwaite, T., Vij, A., & Walker, J. L. (2017). Machine Learning Meets Microeconomics: The Case of Decision Trees and Discrete Choice. arXiv.org>stat>arXiv:1711.04826. https://arxiv.org/abs/1711.04826
Brieman, L. (2001). Statistical Modelling: The Two Cultures. Statistical Science, 16(3), 199-215.
Burger, S.V. (2018). Introduction to Machine Learning with R: Rigorous Mathematical Analysis. Beijing: O’Reilly.
Calude, S. C., & Longo, G. (2017). The Deluge of Spurious Correlations in Big Data. Foundations of Science, 22(3), 595-612. doi:10.1007/s10699-016-9489-4
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications (1st ed.). New York: Cambridge University Press.
Chernozhukov, V., Hansen, C., & Spindler, M. (2015). Valid Post-Selection and Post-Generalization Inference: An Elementary, General Approach. arXiv.org>math>arXiv:1501.03430. https://arxiv.org/abs/1501.03430.
Cowls, J., & Schroeder, R. (2015). Causation, Correlation, and Big Data in Social Science Research. Policy & Internet, 7(4), 447-472.
Cox, D. R. (2001). [Statistical Modeling: The Two Cultures]: Comment. Statistical Science, 16(3), 216-218.
Çağlayan, E. (2012). Nonparametrik Regresyon Modelleri (1st ed.). İstanbul: Derin Yayınları.
Çağlayan Akay, E. (2020). Ekonometride Büyük Veri ve Makine Öğrenmesi: Temel Kavramlar (1st ed.). İstanbul: Der Yayınları.
Deliana, Y., & Rum, I. A. (2017). Understanding Customer Loyalty Using Neural Network. Polish Journal of Management Studies, 16(2), 51-61. doi: 10.17512/pjms.2017.16.2.05
Einav, L., & Levin, J. D. (2013). The Data Evolution and Economic Analysis (NBER Working Paper 19035). Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w19035.
Fan, J. (2014). Features of Big Data and Sparsest Solution in High Confidence Set. In Lin, X. (Ed.) Past, Present and Future of Statistical Science (pp.507-521). Boca Raton: CRC Press.
Fan, J., & Fan, Y. (2008). High-Dimensional Classification Using Features Annealed Independence Rules. The Annals of Statistics, 36(6), 2605-2637. doi:10.1214/07-AOS504
Fan, J., & Lv, J. (2008). Sure Independence Screening for Ultrahigh Dimensional Feature Space. Journal of the Royal Statistical Society, 70(5), 849-911. doi: 10.1111/j.1467-9868.2008.00674.x
Fan, J., & Liao, Y. (2014). Endogeneity in High Dimensions. The Annals of Statistics, 42(3), 872-917. doi:10.1214/13-AOS1202
Fan, J., Guo, S., & Hao, N. (2011). Variance Estimation Using Refitted Cross-Validation in Ultrahigh Dimensional Regression. Journal of the Royal Statistical Society, 74(1), 37-65. doi:10.1111/j.1467-9868.2011.01005.x
Fan, J., Han, F., & Liu, H. (2014). Challenges of Big Data Analysis. National Science Review, 1, 293-314. doi: 10.1093/nsr/nwt032
Fan, J., Runze, L., Zhang, C.H., & Zou, H. (2020). Statistical Foundations of Data Science (1st ed.). Florida: CRC Press.
Farrel, M. H., Liang, T., & Misra, S. (2019). Deep Neural Networks for Estimation and Inference. arXiv.org>econ>arXiv: 1809.09953. https://arxiv.org/abs/1809.09953.
Fisher, R. A. (1934). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd Ltd.
Fu, K., Cheng, D., Tu, Y., & Zhang, L. (2016). Credit Card Fraud Detection Using Convolutional Neural Networks. In Hirose, A., Ozawa, S., Doya, K. Ikeda, K., Lee, M., & Lui, D. (Eds.) Neural Information Processing. (pp. 483-490). Cham: Springer.
Galton, F. (1888). Co-relations and Their Measurement. Proceedings of the Royal Society of London, 45(1888-1889), 135-145.
Gerunov, A. (2016). Modelling Choice Under Radical Uncertainty: Machine Learning Approaches (MPRA Paper No. 69199). Retrieved from https://mpra.ub.uni-muenchen.de/69199/.
Gerunov, A. (2020). Binary Classification Problems in Economics and 136 Different Ways to Solve Them. Center for Economic Theories and Policies, Retrieved from http://www.bep.bg/p/papers.html.
Godin, B. (2009). The Culture of Numbers: The Origins and Development of Statistics on Science (INRS Working Paper No. 40). Retrieved from http://www.csiic.ca/PDF/Godin_40.pdf.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning (1st ed.). London: MIT Press.
Green, W. H. (2018). Econometric Analysis. Harlow: Pearson Education.
Green, W. H., & Hensher, D. A. (2010). Modelling Ordered Choices (1st ed.). New York: Cambridge University Press.
Hastie, T., Friedman, J., & Tibshirani, R. (2017). The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). New York: Springer.
Ikuodo, A., Lane, J., Staudt, J., & Weinburg, B. (2018). Occupational Classifications: A Machine Learning Approach (NBER Working Paper No. 24591). Cambridge, MA: National Bureau of Economic Research, Retrieved from https://www.nber.org/papers/w24951.pdf.
Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction Policy Problems. American Economic Review: Papers & Proceedings, 105(5), 491-495. doi:10.1257/aer.p20151023
Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., & Smith, N. A. (2009). Predicting Risk from Financial Reports with Regression. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ACM, 272-280. Retrieved from http://public.kenan-flagler.unc.edu/faculty/sagij/N09-1031%5B1%5D.pdf.
Kütük, Y., & Güloğlu, B. (2019). Prediction of Transition Probabilities From Unemployment to Employment for Turkey via Machine Learning and Econometrics: A Comparative Study. İktisat Araştırmalar Dergisi, 3(1), 58-75. doi:10.24.954/JORE.2019.29
Lin, M., Lucas, H. C., & Shmueli, G. (2013). Research Commentary-Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research, 24(4), 906-917. doi:10.1287/isre.2013.0480
Liu Y., & Xie, T. (2019). Machine Learning versus Econometrics: Prediction of Box Office. Applied Economics Letters, 26(2), 124-130. doi:10.1080/13504851.2018.1441499
Mayer-Schönberger, V., & Cukier, K. (2014). Big Data: A Revolution that will Transform How we Live, Work, and Think (1st ed.). London: John Murray.
McFadden, D. (1987). Regression-Based Specification Tests for the Multinomial Logit Model. Journal of Econometrics, 34(1-2), 63-83. doi:10.1016/0304-4076(87)90067-4
Mello, F. R., & Ponti, M. A. (2018). Machine Learning (1st ed.). Switzerland: Springer Intentional Publishing
Meng, C. Z., Liu, B. S., & Zhou, L. (2019). The Practive Study of Consumer Credit Risk Based on Random Forest. Advances in Intelligent Systems Research, 168, 101-106. doi:10.2991/masta-19.2019.17
Mitchell, T. M. (1980). The Need for Biases in Generalizations (Tech Report CBM-TR-117). New Jersey, Rutgers University: Rutgers CS Tech Report. Retrieved from http://www-cgi.cs.cmu.edu/~tom/pubs/NeedForBias_1980.pdf.
Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87-106. doi:10.1257/jep.31.2.87
Naik, N., Kominers, S. D., Raskar, R., Glaeser, E. L., & Hidalgo, C. A. (2017). Computer Vision Uncovers Predictors of Pyhsical Urban Change. Proceedings of the National Academy of Sciences, 114(29), 7571-7576.
Pandey, R., Dhoundiyal, M., & Kumar, A. (2015). Correlation Analysis of Big Data to Support Machine Learning. In 2015 Fifth International Conference on Communication Systems and Network Technologies, 996-999. doi:10.1109/CSNT.2015.32
Pearson, K. (1896). VII. Mathematical Contributions of the Theory of Evolution. III. Regression, Heredity and Panmixia. Philosophical Transactions of The Royal Society, 187, 253-318. doi: 10.1098/rsta1896.0007.
Penczynski, S. P. (2019). Using Machine Learning for Communication Classification. Experimental Economics, 22, 1002-1029. Doi:10.1007/s10683-018-09600-z
Ramachandra, V. (2018). Deep Learning for Causal Inference. arXiv.org>econ>arXiv:1803.00149. https://arxiv.org/abs/1803.00149.
Randolph, K. A., & Myers, L. L. (2013). Basic Statistics in Multivariate Analysis (1st ed.). New York: Oxford University Press.
Rokach, L. (2010). Pattern Classification Using Ensemble Methods. Massachusetts: World Scientific Publishing Co. Pte. Ltd.
Serdobolskii, V. (2000). Multivariate Statistical Analysis. Boston: Kluwer Academic Publishers.
Sevüktekin, M. (2000). Ekonometrik Model Kurma Teknikleri. Bursa: Vipaş
Spanos, A. (1986). Statistical Foundations of Econometric Modelling. Cambridge: Cambridge University Press.
Storm, H., Baylis, K., & Heckelei, T. (2020). Machine Learning in Agricultural and Applied Economics, European Review of Agricultural Economics, 47(3), 849-892. doi:10.1093/erae/jbz033
Sugiyama, M. (2016). Introduction to Statistical Machine Learning. Amsterdam: MK Morgan Kaufmann.
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, 58(1), 267-288.
Vapnik, V. (1998). Statistical Learning Theory. New York: John Wiley and Sons.
Varian, H. R. (2014). Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28(2), 3-28. doi:10.1257/jep.28.2.3.
Vellido, A., Martin-Guerrero, J. D., & Lisboa, P. J. G. (2012). Making Machine Learning Models Interpretable. In Proceedgins European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2012, Retrivied from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.431.5382
Wager, S., & Athey, S. (2017). Estimation and Inference of Heterogenous Treatment Effects Using Random Forests. arXiv.org>stat>arXiv:1510.04342. https://arxiv.org/abs/1510.04342.
Webb, A. (2002). Statistical Pattern Recognition. West Sussex: John Wiley ve Sons Ltd.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge: MIT Press.
Yalçıntaş, A. (2018). n≥30 vs. n=all: Büyük Veri, Veri Obezitesi ve Kaybolan Nedensellikler. Yıldız Social Science Review, 4(2), 152-166.
Yao, J., Zheng, S., & Bai, Z. (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis (1st ed.). New York: Cambridge University Press.
Zheng, E., Tan, Y., Goes, P., Chellappa, R., Wu, D J., Shaw, M., Sheng, O., & Gupta, A. (2017). When Econometrics Meets Machine Learning. Data and Information Management, 1(2), 75-83. doi: 10.1515/dim-2017-0012