Siyaset biliminde otomatik metin analizi yöntemleri ve uygulama alanları

Otomatik metin analizi, büyük boyuttaki metin verilerini daha önce mümkün olmayan yollarla analiz etme yeteneği sayesinde siyaset biliminde hızla büyüyen bir alan haline gelmiştir. Ancak, metinsel verileri analiz etmek için pek çok farklı yöntemin bulunması, araştırmacıların araştırma soruları ve verileri için en uygun yaklaşımı belirleme sürecini zorlaştırmaktadır. Bu makale, siyasi olguları incelemek için kullanılan farklı otomatik metin analizi yöntemleri arasından basit istatistiksel analizler, denetimli/denetimsiz makine öğrenmesi, dağılımsal semantik modeller ve kelime gömme yöntemlerini ele alarak araştırmacılara kapsamlı bir kaynak sunmayı amaçlamaktadır. Basit sıklık dağılımlarının hesaplanması ve benzerlik/uzaklık ölçümlerinin kullanımı gibi temel yöntemlerin yanı sıra daha gelişmiş yöntemlerin temel varsayımları, ürettiği çıktılar, güçlü ve zayıf yönleri karşılaştırmalı olarak ele alınmaktadır. Bu çalışma, bu yöntemlerin siyaset bilimine katkı sağlama potansiyelini vurgulamakla birlikte uygulama alanlarından örnekler sunmaktadır.

Automated text analysis methods and application areas in political science

Automated text analysis has become a rapidly growing field in political science due to its ability to analyze large-scale textual data in ways that were not previously possible. However, because there are many different methods available for analyzing textual data, it can be difficult for researchers to choose the most appropriate approach for their research questions and data. This article provides a general overview of the use of statistical summaries, supervised and unsupervised machine learning, distributional semantic models, and word embedding methods for examining political phenomena. It compares the data requirements, outputs produced, basic assumptions, advantages, and disadvantages of not only basic methods such as calculating simple frequency distributions and similarity/distance measurements but also more advanced methods. While emphasizing the potential contribution of these methods to political science, this study provides examples from application areas.

___

  • Atalay, M. ve Çelik, E. (2017). Büyük veri analizinde yapay zekâ ve makine öğrenmesi uygulamalari-artificial intelligence and machine learning applications in big data analysis. Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 9(22), 155-172. doi:10.20875/makusobed.309727
  • Athey, S. (2018). The impact of machine learning on economics. A. Agrawal, J. Gans ve A. Goldfarb (Ed.), The economics of artificial intelligence: An agenda (s.507-547) içinde. Chicago: University of Chicago Press.
  • Aydoğan, M. ve Karcı, A. (2019). Kelime temsil yöntemleri ile kelime benzerliklerinin incelenmesi. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 34(2), 181-196. doi:10.21605/cukurovaummfd.609119
  • Benoit, K. (2020). Text as data: An overview. L. Curini and R. Franzese (Ed.), The handbook of research methods in political science and international relations (ss. 461-497) içinde. Tthousand Oaks: Sage.
  • Benoit, K. ve Laver, M. (2003). Estimating Irish party policy positions using computer wordscoring: The 2002 election–a research note. Irish political studies, 18(1), 97-107. doi:10.1080/07907180312331293249
  • Bisong, E. (2019). Google AutoML: cloud natural language processing. Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners, 599-612. doi: 10.1007/978-1-4842-4470-8_43
  • Bouchart, S. (2020). Classification and clustering. SAGE Publications Ltd. doi:10.4135/9781526486387
  • Budge, I. ve Pennings, P. (2007). Do they work? Validating computerised word frequency estimates against policy series. Electoral Studies, 26(1), 121-129. doi:10.1016/j.electstud.2006.04.002
  • Di Cocco, J. ve Monechi, B. (2022). How populist are parties? Measuring degrees of populism in party manifestos using supervised machine learning. Political Analysis, 30(3), 311-327. doi:10.1017/pan.2021.29
  • Diermeier, D., Godbout, J. F., Yu, B. ve Kaufmann, S. (2012). Language and ideology in Congress. British Journal of Political Science, 42(1), 31-55. doi: 10.1017/S0007123411000160
  • Eggers, A. C., ve Spirling, A. (2018). The shadow cabinet in Westminster systems: modeling opposition agenda setting in the House of Commons, 1832–1915. British Journal of Political Science, 48(2), 343-367. doi:10.1017/S0007123416000016
  • Evans, M., McIntosh, W., Lin, J. ve Cates, C. (2007). Recounting the courts? Applying automated content analysis to enhance empirical legal research. Journal of Empirical Legal Studies, 4(4), 1007-1039. doi: 10.1111/j.1740-1461.2007.00113.x
  • Frid-Nielsen, S. S. (2018). Human rights or security? Positions on asylum in European Parliament speeches. European union politics, 19(2), 344-362. doi: 10.1613/jair.1.13112
  • Gee, J. P. (2018). Reading as situated language: A sociocognitive perspective. In Theoretical models and processes of literacy (s.105-117). New York: Routledge.
  • Godel, W. (2022). Ideology, Social Media and Fake News: New Machine Learning Methods for Political Science (Yayımlanmamış doktora tezi). Wilf Family Department of Politics, New York University.
  • Gökçe, O. (2006). İçerik analizi-kuramsal ve pratik bilgiler. Ankara: Siyasal Kitabevi
  • Grimmer, J. (2010). A bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate press releases. Political Analysis, 18(1), 1-35. doi: 10.1093/pan/mpp034
  • Grimmer, J., Roberts, M.E. ve Stewart, B.M. (2021). Machine learning for social science: an agnostic approach. Annual Review of Political Science, 24, 395-419. doi: 10.1146/annurev-polisci-053119-015921
  • Grimmer, J., Roberts, M.E. ve Stewart, B.M. (2022). Text as data: a new framework for machine learning and the social sciences. New Jersey: Princeton University Press.
  • Grimmer, J. ve Stewart, B. M. (2013). Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267-297. doi:10.1093/pan/mps028
  • Gül, S.S. ve Nizam, Ö.K. (2021). Sosyal bilimlerde içerik ve söylem analizi. Pamukkale Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 42, 181-198. doi: 10.30794/pausbed.803182
  • Gyasi, W.K. (2023). The readability of political party manifestos of the 2016 general elections in Ghana. Athens Journal of Mass Media and Communications, 9(1), 57-70. doi:10.30958/ajmmc
  • Hatipoğlu, E., Gökçe, O.Z., Arın, İ. ve Saygın, Y. (2022). Otomatik metin analizi ve uluslararası ilişkiler. E. Aydınlı (Der.). Uluslararası İlişkiler Metodolojisi içinde (s.135-166). İstanbul: Koç Üniversitesi Yayınları.
  • Hjorth, F., Klemmensen, R., Hobolt, S., Hansen, M.E. ve Kurrild-Klitgaard, P. (2015). Computers, coders, and voters: comparing automated methods for estimating party positions. Research & Politics, 2(2), 1-9. doi: 10.1177/2053168015580476
  • Kapočiūtė-Dzikienė, J. ve Krupavičius, A. (2014). Predicting party group from the Lithuanian parliamentary speeches. Information Technology and Control, 43(3), 321-332. doi:10.5755/j01.itc.43.3.5871
  • Kaynar, O., Görmez, Y., Yıldız, M. ve Albayrak, A. (2016). Makine öğrenmesi yöntemleri ile duygu analizi. International Artificial Intelligence and Data Processing Symposium (IDAP’16), 234-241.
  • Kılıç, H., Atalay, E. ve Yurtsever, A.E. (2019). Büyük veri (Bigdata) ve müşteri ilişkileri yönetimi (CRM) işbirliğinin pazarlama iletişimi stratejilerindeki rolü: büyük ölçekli özel bir banka örneği. Stratejik ve Sosyal Araştırmalar Dergisi, 3(2), 289-310. doi: 10.30692/sisad.574133
  • Klemmensen, R., Hobolt, S.B. ve Hansen, M.E. (2007). Estimating policy positions using political texts: an evaluation of the wordscores approach. Electoral Studies, 26(4), 746-755. doi:10.1016/j.electstud.2007.07.006
  • Konşuk Ünlü, H. (2022). Başlığında “data science” ifadesi geçen uluslararası kongrelerde sunulan bildiri özetlerinin metin madenciliği yöntemleri ile incelenmesi. Nicel Bilimler Dergisi, 4(1), 1-21. doi:10.51541/nicel.1075225
  • Kroon, A.C., van der Meer, T. ve Vliegenthart, R. (2022). Beyond counting words: assessing performance of dictionaries, supervised machine learning, and embeddings in topic and frame classification. Computational Communication Research, 4(2), 528-570. doi:10.5117/CCR2022.2.006.KROO
  • Monroe, B.L. ve Schrodt, P.A. (2008). Introduction to the special issue: the statistical analysis of political text. Political Analysis, 16(4), 351-355. doi: 10.1093/pan/mpn017
  • Montgomery, J.M. ve Olivella, S. (2018). Tree-Based Models for Political Science Data. American Journal of Political Science, 62(3), 729-744. doi: 10.1111/ajps.12361
  • Nayak, A. ve Natarajan, D. (2016). Comparative study of naive Bayes, support vector machine and random forest classifiers in sentiment analysis of twitter feeds. International Journal of Advance Studies in Computer Science and Engineering (IJASCSE), 5(1), 16. Erişim adresi: https://rb.gy/964f1h
  • Nelson, L.K. (2020). Computational grounded theory: a methodological framework. Sociological Methods & Research, 49(1), 3-42. doi: 10.1177/0049124117729703
  • Neuendorf, K.A. (2004). Content analysis: a contrast and complement to discourse analysis. Qualitative methods, 2(1), 33-36. Erişim adresi: https://zenodo.org/record/998700
  • Neuendorf, K.A. (2017). The content analysis guidebook. New Delhi: SAGE.
  • Nguyen, V.A., Boyd-Graber, J., Resnik, P. ve Miler, K. (2015). Tea party in the house: a hierarchical ideal point topic model and its application to republican legislators in the 112th congress. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 1438-1448.
  • Onan, A. (2020). Evrişimli sinir ağı mimarilerine dayalı türkçe duygu analizi. Avrupa Bilim ve Teknoloji Dergisi, 374-380. doi: 10.31590/ejosat.780609
  • Osgood, C.E. (1959). Representational model ve relevant research methods. In I. Pool (Ed.), Trends in content analysis (ss. 33-38). Urbana, IL : Illinois Press.
  • Osisanwo, F.Y., Akinsola, J.E.T., Awodele, O., Hinmikaiye, J.O., Olakanmi, O. ve Akinjobi, J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128-138. doi:10.14445/22312803/IJCTT-V48P126
  • Özyiğit, H. (2022). Muhasebe alanına güncel yaklaşımlar: metin madenciliği. Muhasebe ve Vergi Uygulamaları Dergisi, 15(3), 637-663. doi: 10.29067/muvu.1104525
  • Özoran, B.A. (2022). Bir halkla ilişkiler aracı olarak twitter: dünya sağlık örgütü paylaşımlarının içerik analizi ve metin madenciliği ile incelenmesi. Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 20(04), 125-146. doi: 10.18026/cbayarsos.1083191
  • Quinn, K.M., Monroe, B.L., Colaresi, M., Crespin, M.H. ve Radev, D.R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209-228. doi: 10.1111/j.1540-5907.2009.00427.x.
  • Peterson, A. ve Spirling, A. (2018). Classification accuracy as a substantive quantity of interest: measuring polarization in westminster systems. Political Analysis, 26(1), 120-128. doi:10.1017/pan.2017.39
  • Polat, H. ve Körpe, M. (2018). TBMM genel kurul tutanaklarından yakın anlamlı kavramların çıkarılması. Bilişim Teknolojileri Dergisi, 11(3), 235-244. doi: 10.17671/gazibtd.402468
  • Rheault, L. ve Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112-133. doi: 10.1017/pan.2019.26.
  • Roberts, C.W. (Ed.). (2020). Text analysis for the social sciences: methods for drawing statistical inferences from texts and transcripts. New York: Routledge.
  • Rodman, E. (2020). A timely intervention: tracking the changing meanings of political concepts with word vectors. Political Analysis, 28(1), 87-111. doi: 10.1017/pan.2019.23.
  • Rodriguez, P. L. ve Spirling, A. (2022). Word embeddings: what works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics, 84(1), 101-115. doi:10.1086/715162.
  • Sagarzazu, I. ve Klüver, H. (2017). Coalition governments and party competition: political communication strategies of coalition parties. Political Science Research and Methods, 5(2), 333-349. doi: 10.1017/psrm.2015.56
  • Sanders, J., Lisi, G. ve Schonhardt-Bailey, C. (2017). Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis. Statistics, Politics and Policy, 8(2), 153-194. doi: 10.1515/spp-2017-0012
  • Schoonvelde, M., Schumacher, G. ve Bakker, B.N. (2019). Friends with text as data benefits: assessing and extending the use of automated text analysis in political science and political psychology. Journal of Social and Political Psychology, 7(1), 124-143. doi:10.5964/jspp.v7i1.964
  • Shrestha, A. ve Spezzano, F. (2021). Textual characteristics of news title and body to detect fake news: a reproducibility study. Advances in Information Retrieval: 43rd European Conference on IR Research, 43, 120-133. doi: 10.1007/978-3-030-72240-1_9
  • Silge, J. ve Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1(3), 37. doi: 10.21105/joss.00037
  • Slapin, J.B. ve Proksch, S.O. (2008). A scaling model for estimating time‐series party positions from texts. American Journal of Political Science, 52(3), 705-722. doi: 10.1111/j.1540-5907.2008.00338.x
  • Spirling, A. (2012). US treaty making with American Indians: Institutional change and relative power, 1784–1911. American Journal of Political Science, 56(1), 84-97. doi: 10.1111/j.1540-5907.2011.00558.x
  • Şahinaslan, Ö., Dalyan, H. ve Şahinaslan, E. (2022). Naive bayes sınıflandırıcısı kullanılarak youtube verileri üzerinden çok dilli duygu analizi. Bilişim Teknolojileri Dergisi, 15(2), 221-229. doi: 10.17671/gazibtd.999960
  • Tumasjan, A., Sprenger, T., Sandner, P. ve Welpe, I. (2010). Predicting elections with twitter: what 140 characters reveal about political sentiment. Proceedings of the international AAAI conference on web and social media, 4(1), 178-185. doi: 10.1609/icwsm.v4i1.14009
  • Uslu, O. ve Özmen-Akyol, S. (2021). Türkçe haber metinlerinin makine öğrenmesi yöntemleri kullanılarak sınıflandırılması. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 2(1), 15-20. Erişim adresi: https://dergipark.org.tr/en/download/article-file/1483397
  • Van Loon, A. (2022). Three families of automated text analysis. Social Science Research, 108, 102798. doi: 10.1016/j.ssresearch.2022.102798
  • Vasiliev, Y. (2020). Natural language processing with Python and spaCy: A practical introduction. San Francisco: No Starch Press.
  • Wesley, J.J. (2014). The qualitative analysis of political documents. Bertie Kaal, Isa Maks ve Annemarie van Elfrinkhof (Ed.), From text to political positions: text analysis across disciplines (ss.135-160) içinde. Amsterdam: John Benjamins
  • Wilkerson, J. ve Casas, A. (2017). Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20, 529-544. doi: 10.1146/annurev-polisci-052615-025542
  • Young, L. ve Soroka, S. (2012). Affective news: the automated coding of sentiment in political texts. Political Communication, 29(2), 205-231. doi: 10.1080/10584609.2012.671234
  • Yu, B., Kaufmann, S. ve Diermeier, D. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33-48. doi:10.1080/19331680802149608
  • Zanini, N. ve Dhawan, V. (2015). Text Mining: an introduction to theory and some applications. Research Matters, 19, 38-45. Erişim adresi: https://rb.gy/q4rwu5