Mansur Alp TOÇOĞLU, AZER ÇELİKTEN, İrfan AYGÜN, ADİL ALPKOÇAK

Türkçe Metinlerde Duygu Analizi için Farklı Makine Öğrenmesi Yöntemlerinin Karşılaştırılması

Bu çalışmada, Türkçe metinlerden duygu çıkarımı alanında kullanılan TREMO veri seti üzerinde farklı makine öğrenmesi algoritmasının sınıflandırma sonuçları karşılaştırılmıştır. Duygu analizi bir metin sınıflandırma problemi olarak ele alınmış ve Yapay Sinir Ağları (YSA), Destek Vektör Makineleri (DVM), Random Forest (RF) ve K-En Yakın Komşu (KEYK) algortimaları olmak üzere dört yaklaşım incelenmiştir. İncelenen duygu kategorileri olarak veri setinin sağladığı, mutluluk, korku, öfke, üzüntü, tiksinme ve şaşırma kategorileri kullanılmıştır. Veri ön işleme bölümünde, veri setini oluşturan kelimelerin kökleri ilk beş karakter (F5) yöntemi kullanılarak tespit edilmiştir. Kelimeler kök haline getirildikten sonra Vektör Uzay Modeli ile veri seti modellenmiş ve her duygu için en önemli ilk 500 kelime Kaşılıklı Bilgi (Mutual Information-MI) yöntemi ile tespit edilmiştir. Sınıflandırma sonuçlarının karşılaştırılmasında doğruluk metriği esas alınmıştır. Deneysel çalışma sonuçlarına göre, YSA algoritması en iyi sonucu vermiştir. DVM, RF ve KEYK algoritmaları ise bu sıra ile azalan başarım göstermişlerdir.

Anahtar Kelimeler:

TREMO, Duygu Analizi, Makine Öğrenmesi, Metin Madenciliği

Comparison of Different Machine Learning Approaches for Emoti on Analysis in Turkish

In this research, the classification results of different Machine Learning Algorithms were compared on the validated TREMO data set used in the field of emotion extraction from Turkish texts. Emotion analysis was considered as text classification problem and four different machine algorithms, Artificial Neural Networks (ANN), Support Vector Machines (SVM), Random Forest (RF) and K‑Nearest Neighbor (KNN) have been investigated. The categories provided by the data set, which are happiness, fear, anger, sadness, disgust and surprise, were used as emotion categories. In the preprocessing phase, stemming process was performed using the truncate at five (F5) method. After stemming process, the data set was modeled using the Vector Space Model. After that, the first 500 words for each emotion in the data set were identified by the Mutual Information (MI) formula. The comparison of classification results was based on accuracy metric. According to experimental study results, the ANN classifier was performed best, and SVM, RF and KNN performed, in descending order

Keywords:

Machine Learning, Text Mining, TREMO, Emotion Analysis,

PDF

___

[1] Kılınç, D., Özçift, A., Bozyigit, F., Yıldırım, P., Yücalar, F., & Borandag, E. 2017. TTC-3600: A New Benchmark Dataset for Turkish Text Categorization. Journal of Information Science, Cilt. 43, s. 174-185. DOI: https://doi.org/10.1177/0165551515620551
[2] Scherer, K.R., Wallbott, H.G. 1994. Evidence for Universality and Cultural Variation of Differential Emotion Response Patterning. Journal of Personality and Social Psychology, Cilt. 67, s. 55. DOI: http://dx.doi.org/10.1037/0022-3514.67.1.55
[3] Danisman, T., Alpkocak, A. 2008. Feeler: Emotion Classification of Text Using Vector Space Model. AISB 2008 Convention Communication, Interaction and Social Intelligence, 1-4 Nisan, Aberden, 53-59.
[4] Giachanou, A., Crestani, F. 2016. Opinion Retrieval in Twitter Using Stylistic Variations. 31. Annual ACM Symposium on Applied Computing, 4-8 Nisan, Pisa, 1077-1079.
[5] Luo, Z., Osborne, M., Wang, T. 2015. An Effective Approach to Tweets Opinion Retrieval. World Wide Web, Cilt. 18, s. 545-566.
[6] Go, A., Bhayani, R., Huang, L. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report, Stanford, 1(12).
[7] Mohammad, S. 2012. Portable Features for Classifying Emotional Text. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3-8 Haziran, Montreal, 587-591.
[8] Ekman, P. 1992. An Argument for Basic Emotions, Cognition & Emotion. Cilt. 6, s. 169-200.
[9] Strapparava, C., Valitutti, A. 2004. Wordnet Affect: an Affective Extension of Wordnet. International Conference on Language Resources and Evaluation, 26-28 Mayıs, Lizbon, 1083-1086.
[10] Strapparava, C., Mihalcea, R. 2007. Semeval-2007 task 14: Affective Text. 4. International Workshop on Semantic Evaluations, 23-24 Haziran, Prag, 70-74.
[11] Chaffar, S., Inkpen, D. 2011. Using a Heterogeneous Dataset for Emotion Analysis in Text. Conference on Artificial Intelligence, 25-27 Mayıs, St. John's, 62-67.
[12] Kouloumpis, E., Wilson, T., Moore, J.D. 2011. Twitter Sentiment Analysis: The Good the Bad and the Omg!. 5. International Conference on Weblogs and Social Media, 17-21 Temmuz, Barselona, 538-541.
[13] Yang, C., Lin, K. H.Y., Chen, H.H. 2007. Emotion Classification Using Web Blog Corpora. Web Intelligence Conference, 2-5 Kasım, Washington DC, 275-278.
[14] Tocoglu, M.A., Alpkocak, A. 2018. TREMO: A Dataset for Emotion Analysis in Turkish. Journal of InformationScience. DOI: https://doi.org/10.1177/0165551518761014.
[15] Manning, C.D., Raghavan, P., Schütze, H. 2008. Boolean Retrieval. Introduction to Information Retrieval, 1-18.
[16] Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H. C., Vursavas, O. M. 2008. Information Retrieval on Turkish Texts. Journal of the American Society for Information Science and Technology, Cilt. 59, s. 407-421. DOI: https://doi.org/10.1002/asi.20750