Siyaset biliminde otomatik metin analizi yöntemleri ve uygulama alanları

Otomatik metin analizi, büyük boyuttaki metin verilerini daha önce mümkün olmayan yollarla analiz etme yeteneği sayesinde siyaset biliminde hızla büyüyen bir alan haline gelmiştir. Ancak, metinsel verileri analiz etmek için pek çok farklı yöntemin bulunması, araştırmacıların araştırma soruları ve verileri için en uygun yaklaşımı belirleme sürecini zorlaştırmaktadır. Bu makale, siyasi olguları incelemek için kullanılan farklı otomatik metin analizi yöntemleri arasından basit istatistiksel analizler, denetimli/denetimsiz makine öğrenmesi, dağılımsal semantik modeller ve kelime gömme yöntemlerini ele alarak araştırmacılara kapsamlı bir kaynak sunmayı amaçlamaktadır. Basit sıklık dağılımlarının hesaplanması ve benzerlik/uzaklık ölçümlerinin kullanımı gibi temel yöntemlerin yanı sıra daha gelişmiş yöntemlerin temel varsayımları, ürettiği çıktılar, güçlü ve zayıf yönleri karşılaştırmalı olarak ele alınmaktadır. Bu çalışma, bu yöntemlerin siyaset bilimine katkı sağlama potansiyelini vurgulamakla birlikte uygulama alanlarından örnekler sunmaktadır.

Anahtar Kelimeler:

Otomatik Metin Analizi, Siyaset Bilimi, Büyük Veri, Makine Öğrenmesi, Araştırma Yöntemleri

Automated text analysis methods and application areas in political science

Automated text analysis has become a rapidly growing field in political science due to its ability to analyze large-scale textual data in ways that were not previously possible. However, because there are many different methods available for analyzing textual data, it can be difficult for researchers to choose the most appropriate approach for their research questions and data. This article provides a general overview of the use of statistical summaries, supervised and unsupervised machine learning, distributional semantic models, and word embedding methods for examining political phenomena. It compares the data requirements, outputs produced, basic assumptions, advantages, and disadvantages of not only basic methods such as calculating simple frequency distributions and similarity/distance measurements but also more advanced methods. While emphasizing the potential contribution of these methods to political science, this study provides examples from application areas.

Keywords:

Automated Text Analysis, Political Science, Big Data, Machine Learning, Research Methods,

PDF

___

Atalay, M. ve Çelik, E. (2017). Büyük veri analizinde yapay zekâ ve makine öğrenmesi uygulamalari-artificial intelligence and machine learning applications in big data analysis. Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 9(22), 155-172. doi:10.20875/makusobed.309727
Athey, S. (2018). The impact of machine learning on economics. A. Agrawal, J. Gans ve A. Goldfarb (Ed.), The economics of artificial intelligence: An agenda (s.507-547) içinde. Chicago: University of Chicago Press.
Aydoğan, M. ve Karcı, A. (2019). Kelime temsil yöntemleri ile kelime benzerliklerinin incelenmesi. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 34(2), 181-196. doi:10.21605/cukurovaummfd.609119
Benoit, K. (2020). Text as data: An overview. L. Curini and R. Franzese (Ed.), The handbook of research methods in political science and international relations (ss. 461-497) içinde. Tthousand Oaks: Sage.
Benoit, K. ve Laver, M. (2003). Estimating Irish party policy positions using computer wordscoring: The 2002 election–a research note. Irish political studies, 18(1), 97-107. doi:10.1080/07907180312331293249
Bisong, E. (2019). Google AutoML: cloud natural language processing. Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners, 599-612. doi: 10.1007/978-1-4842-4470-8_43
Bouchart, S. (2020). Classification and clustering. SAGE Publications Ltd. doi:10.4135/9781526486387
Budge, I. ve Pennings, P. (2007). Do they work? Validating computerised word frequency estimates against policy series. Electoral Studies, 26(1), 121-129. doi:10.1016/j.electstud.2006.04.002
Di Cocco, J. ve Monechi, B. (2022). How populist are parties? Measuring degrees of populism in party manifestos using supervised machine learning. Political Analysis, 30(3), 311-327. doi:10.1017/pan.2021.29
Diermeier, D., Godbout, J. F., Yu, B. ve Kaufmann, S. (2012). Language and ideology in Congress. British Journal of Political Science, 42(1), 31-55. doi: 10.1017/S0007123411000160
Eggers, A. C., ve Spirling, A. (2018). The shadow cabinet in Westminster systems: modeling opposition agenda setting in the House of Commons, 1832–1915. British Journal of Political Science, 48(2), 343-367. doi:10.1017/S0007123416000016
Evans, M., McIntosh, W., Lin, J. ve Cates, C. (2007). Recounting the courts? Applying automated content analysis to enhance empirical legal research. Journal of Empirical Legal Studies, 4(4), 1007-1039. doi: 10.1111/j.1740-1461.2007.00113.x
Frid-Nielsen, S. S. (2018). Human rights or security? Positions on asylum in European Parliament speeches. European union politics, 19(2), 344-362. doi: 10.1613/jair.1.13112
Gee, J. P. (2018). Reading as situated language: A sociocognitive perspective. In Theoretical models and processes of literacy (s.105-117). New York: Routledge.
Godel, W. (2022). Ideology, Social Media and Fake News: New Machine Learning Methods for Political Science (Yayımlanmamış doktora tezi). Wilf Family Department of Politics, New York University.
Gökçe, O. (2006). İçerik analizi-kuramsal ve pratik bilgiler. Ankara: Siyasal Kitabevi
Grimmer, J. (2010). A bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate press releases. Political Analysis, 18(1), 1-35. doi: 10.1093/pan/mpp034
Grimmer, J., Roberts, M.E. ve Stewart, B.M. (2021). Machine learning for social science: an agnostic approach. Annual Review of Political Science, 24, 395-419. doi: 10.1146/annurev-polisci-053119-015921
Grimmer, J., Roberts, M.E. ve Stewart, B.M. (2022). Text as data: a new framework for machine learning and the social sciences. New Jersey: Princeton University Press.
Grimmer, J. ve Stewart, B. M. (2013). Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267-297. doi:10.1093/pan/mps028
Gül, S.S. ve Nizam, Ö.K. (2021). Sosyal bilimlerde içerik ve söylem analizi. Pamukkale Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 42, 181-198. doi: 10.30794/pausbed.803182
Gyasi, W.K. (2023). The readability of political party manifestos of the 2016 general elections in Ghana. Athens Journal of Mass Media and Communications, 9(1), 57-70. doi:10.30958/ajmmc
Hatipoğlu, E., Gökçe, O.Z., Arın, İ. ve Saygın, Y. (2022). Otomatik metin analizi ve uluslararası ilişkiler. E. Aydınlı (Der.). Uluslararası İlişkiler Metodolojisi içinde (s.135-166). İstanbul: Koç Üniversitesi Yayınları.
Hjorth, F., Klemmensen, R., Hobolt, S., Hansen, M.E. ve Kurrild-Klitgaard, P. (2015). Computers, coders, and voters: comparing automated methods for estimating party positions. Research & Politics, 2(2), 1-9. doi: 10.1177/2053168015580476
Kapočiūtė-Dzikienė, J. ve Krupavičius, A. (2014). Predicting party group from the Lithuanian parliamentary speeches. Information Technology and Control, 43(3), 321-332. doi:10.5755/j01.itc.43.3.5871
Kaynar, O., Görmez, Y., Yıldız, M. ve Albayrak, A. (2016). Makine öğrenmesi yöntemleri ile duygu analizi. International Artificial Intelligence and Data Processing Symposium (IDAP’16), 234-241.
Kılıç, H., Atalay, E. ve Yurtsever, A.E. (2019). Büyük veri (Bigdata) ve müşteri ilişkileri yönetimi (CRM) işbirliğinin pazarlama iletişimi stratejilerindeki rolü: büyük ölçekli özel bir banka örneği. Stratejik ve Sosyal Araştırmalar Dergisi, 3(2), 289-310. doi: 10.30692/sisad.574133
Klemmensen, R., Hobolt, S.B. ve Hansen, M.E. (2007). Estimating policy positions using political texts: an evaluation of the wordscores approach. Electoral Studies, 26(4), 746-755. doi:10.1016/j.electstud.2007.07.006
Konşuk Ünlü, H. (2022). Başlığında “data science” ifadesi geçen uluslararası kongrelerde sunulan bildiri özetlerinin metin madenciliği yöntemleri ile incelenmesi. Nicel Bilimler Dergisi, 4(1), 1-21. doi:10.51541/nicel.1075225
Kroon, A.C., van der Meer, T. ve Vliegenthart, R. (2022). Beyond counting words: assessing performance of dictionaries, supervised machine learning, and embeddings in topic and frame classification. Computational Communication Research, 4(2), 528-570. doi:10.5117/CCR2022.2.006.KROO
Monroe, B.L. ve Schrodt, P.A. (2008). Introduction to the special issue: the statistical analysis of political text. Political Analysis, 16(4), 351-355. doi: 10.1093/pan/mpn017
Montgomery, J.M. ve Olivella, S. (2018). Tree-Based Models for Political Science Data. American Journal of Political Science, 62(3), 729-744. doi: 10.1111/ajps.12361
Nayak, A. ve Natarajan, D. (2016). Comparative study of naive Bayes, support vector machine and random forest classifiers in sentiment analysis of twitter feeds. International Journal of Advance Studies in Computer Science and Engineering (IJASCSE), 5(1), 16. Erişim adresi: https://rb.gy/964f1h
Nelson, L.K. (2020). Computational grounded theory: a methodological framework. Sociological Methods & Research, 49(1), 3-42. doi: 10.1177/0049124117729703
Neuendorf, K.A. (2004). Content analysis: a contrast and complement to discourse analysis. Qualitative methods, 2(1), 33-36. Erişim adresi: https://zenodo.org/record/998700
Neuendorf, K.A. (2017). The content analysis guidebook. New Delhi: SAGE.
Nguyen, V.A., Boyd-Graber, J., Resnik, P. ve Miler, K. (2015). Tea party in the house: a hierarchical ideal point topic model and its application to republican legislators in the 112th congress. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 1438-1448.
Onan, A. (2020). Evrişimli sinir ağı mimarilerine dayalı türkçe duygu analizi. Avrupa Bilim ve Teknoloji Dergisi, 374-380. doi: 10.31590/ejosat.780609
Osgood, C.E. (1959). Representational model ve relevant research methods. In I. Pool (Ed.), Trends in content analysis (ss. 33-38). Urbana, IL : Illinois Press.
Osisanwo, F.Y., Akinsola, J.E.T., Awodele, O., Hinmikaiye, J.O., Olakanmi, O. ve Akinjobi, J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128-138. doi:10.14445/22312803/IJCTT-V48P126
Özyiğit, H. (2022). Muhasebe alanına güncel yaklaşımlar: metin madenciliği. Muhasebe ve Vergi Uygulamaları Dergisi, 15(3), 637-663. doi: 10.29067/muvu.1104525
Özoran, B.A. (2022). Bir halkla ilişkiler aracı olarak twitter: dünya sağlık örgütü paylaşımlarının içerik analizi ve metin madenciliği ile incelenmesi. Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 20(04), 125-146. doi: 10.18026/cbayarsos.1083191
Quinn, K.M., Monroe, B.L., Colaresi, M., Crespin, M.H. ve Radev, D.R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209-228. doi: 10.1111/j.1540-5907.2009.00427.x.
Peterson, A. ve Spirling, A. (2018). Classification accuracy as a substantive quantity of interest: measuring polarization in westminster systems. Political Analysis, 26(1), 120-128. doi:10.1017/pan.2017.39
Polat, H. ve Körpe, M. (2018). TBMM genel kurul tutanaklarından yakın anlamlı kavramların çıkarılması. Bilişim Teknolojileri Dergisi, 11(3), 235-244. doi: 10.17671/gazibtd.402468
Rheault, L. ve Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), 112-133. doi: 10.1017/pan.2019.26.
Roberts, C.W. (Ed.). (2020). Text analysis for the social sciences: methods for drawing statistical inferences from texts and transcripts. New York: Routledge.
Rodman, E. (2020). A timely intervention: tracking the changing meanings of political concepts with word vectors. Political Analysis, 28(1), 87-111. doi: 10.1017/pan.2019.23.
Rodriguez, P. L. ve Spirling, A. (2022). Word embeddings: what works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics, 84(1), 101-115. doi:10.1086/715162.
Sagarzazu, I. ve Klüver, H. (2017). Coalition governments and party competition: political communication strategies of coalition parties. Political Science Research and Methods, 5(2), 333-349. doi: 10.1017/psrm.2015.56
Sanders, J., Lisi, G. ve Schonhardt-Bailey, C. (2017). Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis. Statistics, Politics and Policy, 8(2), 153-194. doi: 10.1515/spp-2017-0012
Schoonvelde, M., Schumacher, G. ve Bakker, B.N. (2019). Friends with text as data benefits: assessing and extending the use of automated text analysis in political science and political psychology. Journal of Social and Political Psychology, 7(1), 124-143. doi:10.5964/jspp.v7i1.964
Shrestha, A. ve Spezzano, F. (2021). Textual characteristics of news title and body to detect fake news: a reproducibility study. Advances in Information Retrieval: 43rd European Conference on IR Research, 43, 120-133. doi: 10.1007/978-3-030-72240-1_9
Silge, J. ve Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1(3), 37. doi: 10.21105/joss.00037
Slapin, J.B. ve Proksch, S.O. (2008). A scaling model for estimating time‐series party positions from texts. American Journal of Political Science, 52(3), 705-722. doi: 10.1111/j.1540-5907.2008.00338.x
Spirling, A. (2012). US treaty making with American Indians: Institutional change and relative power, 1784–1911. American Journal of Political Science, 56(1), 84-97. doi: 10.1111/j.1540-5907.2011.00558.x
Şahinaslan, Ö., Dalyan, H. ve Şahinaslan, E. (2022). Naive bayes sınıflandırıcısı kullanılarak youtube verileri üzerinden çok dilli duygu analizi. Bilişim Teknolojileri Dergisi, 15(2), 221-229. doi: 10.17671/gazibtd.999960
Tumasjan, A., Sprenger, T., Sandner, P. ve Welpe, I. (2010). Predicting elections with twitter: what 140 characters reveal about political sentiment. Proceedings of the international AAAI conference on web and social media, 4(1), 178-185. doi: 10.1609/icwsm.v4i1.14009
Uslu, O. ve Özmen-Akyol, S. (2021). Türkçe haber metinlerinin makine öğrenmesi yöntemleri kullanılarak sınıflandırılması. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 2(1), 15-20. Erişim adresi: https://dergipark.org.tr/en/download/article-file/1483397
Van Loon, A. (2022). Three families of automated text analysis. Social Science Research, 108, 102798. doi: 10.1016/j.ssresearch.2022.102798
Vasiliev, Y. (2020). Natural language processing with Python and spaCy: A practical introduction. San Francisco: No Starch Press.
Wesley, J.J. (2014). The qualitative analysis of political documents. Bertie Kaal, Isa Maks ve Annemarie van Elfrinkhof (Ed.), From text to political positions: text analysis across disciplines (ss.135-160) içinde. Amsterdam: John Benjamins
Wilkerson, J. ve Casas, A. (2017). Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20, 529-544. doi: 10.1146/annurev-polisci-052615-025542
Young, L. ve Soroka, S. (2012). Affective news: the automated coding of sentiment in political texts. Political Communication, 29(2), 205-231. doi: 10.1080/10584609.2012.671234
Yu, B., Kaufmann, S. ve Diermeier, D. (2008). Classifying party affiliation from political speech. Journal of Information Technology & Politics, 5(1), 33-48. doi:10.1080/19331680802149608
Zanini, N. ve Dhawan, V. (2015). Text Mining: an introduction to theory and some applications. Research Matters, 19, 38-45. Erişim adresi: https://rb.gy/q4rwu5