Lexicon-based emotion analysis in Turkish

Lexicon-based emotion analysis in Turkish

In this paper, we proposed a lexicon for emotion analysis in Turkish for six emotional categories happiness,fear, anger, sadness, disgust, and surprise. Besides, we also investigated the effects of a lemmatizer and a stemmer, two term-weighting schemes, four lexicon enrichment methods, and a term selection approach for lexicon construction. To do this, we generated Turkish emotion lexicon based on a dataset, TREMO, containing 25,989 documents. We then preprocessed the documents to obtain dictionary and stem forms of each term using a lemmatizer and a stemmer. Afterwards, we proposed two different weighting schemes where term frequency, term-class frequency and mutual information (MI) values for six emotion categories are taken into consideration. We then enriched the lexicon by using bigram and concept hierarchy methods, and performed term selection for efficiency issues. Then, we compared the performance of lexicon-based approach with machine learning based approach by using our proposed lexicon. The experiments showed that the use of the proposed lexicon efficiently produces comparable results in emotion analysis in Turkish text.

___

  • [1] Taboada M, Brooke J, Tofiloski M, Voll K, Stede M. Lexicon-based methods for sentiment analysis. Comput Linguist 2011; 37: 267-307.
  • [2] Ding X, Liu B, Yu PS. A holistic lexicon-based approach to opinion mining. In: Conference on Web Search and Web Data Mining (WSDM); 11-12 February 2008; Palo Alto, CA, USA: ACM. pp. 231-240.
  • [3] Hu M, Liu B. Mining and summarizing customer reviews. In: International Conference on Knowledge Discovery and Data Mining (KDD2004); 22-25 August 2004; Seattle, WA, USA: ACM. pp. 168-177.
  • [4] Turney P. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Association for Computational Linguistics (ACL); 07-12 July 2002; Philadelphia, PA, USA. pp. 417-424.
  • [5] Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv 2002; 34: 1-47.
  • [6] Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R. Sentiment analysis of twitter data. In: Workshop on Language in Social Media; 23 June 2011; Portland, OR, USA. pp. 30-38.
  • [7] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In: Empirical Methods in Natural Language Processing (EMNLP); July 2002; Stroudsburg, PA, USA. pp. 79-86.
  • [8] Saif H, He Y, Alani H. Alleviating data sparsity for twitter sentiment analysis. In: The 2nd Workshop on Making Sense of Microposts: Big things come in small packages at World Wide Web; 16 April 2012; Lyon, France. pp. 2-9.
  • [9] Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. Comput Intell 2006; 22:110-125.
  • [10] Blinov PD, Klekovkina MV, Kotelnikov EV, Pestov OA. Research of lexical approach and machine learning methods for sentiment analysis. Comput Linguist & Intellect Technol 2013; 2: 48-58.
  • [11] He Y. Incorporating sentiment prior knowledge for weakly supervised sentiment analysis. ACM Trans Asian Lang Inform Proc 2012; 11.
  • [12] Aue A, Gamon M. Customizing sentiment classifiers to new domains: A case study. In: International Conference on Recent Advances in Natural Language Processing; 21-23 September 2005; Borovets, Bulgaria.
  • [13] Nielsen FA. A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: Workshop on Making Sense of Microposts; 30 May 2011; Heraklion, Crete. pp. 93-98.
  • [14] Fellbaum C. WordNet: An Electronic Lexical Database. London, UK: MIT Press, 1998.
  • [15] Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A. Sentiment strength detection in short informal text. J Am Soc Inform Sci Technol 2010; 61: 2544-2558.
  • [16] Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: The 7th International Conference on Language Resources and Evaluation; 17-23 May 2010; Valletta, Malta. pp. 2200-2204.
  • [17] Wiebe J, Wilson T, Cardie C. Annotating expressions of opinions and emotions in language. Lang Resour Eval 2005; 39: 165-210.
  • [18] Mohammad SM, Turney PD. Crowdsourcing a word-emotion association lexicon. Comput Intell 2012; 29: 436-465.
  • [19] Plutchik R. A general psychoevolutionary theory of emotion. Emotion: Theory, Res, & Exp 1980; 1: 3-33.
  • [20] Strapparava C, Valitutti A. Wordnet-affect: An affective extension of wordnet. In: The 4th International Conference on Language Resources and Evaluation; 26-28 May 2004; Lisbon, Portugal. pp. 1083-1086.
  • [21] Ekman P. An argument for basic emotions. Cogn & Emot 1992; 6: 169-200.
  • [22] Stone PJ, Dunphy DC, Smith MS, Ogilvie DM. The General Inquirer: A Computer Approach to Content Analysis. Cambridge, MA, USA: MIT Press, 1966.
  • [23] Mohammad SM. #Emotional tweets. In: First Joint Conference on Lexical and Computational Semantics; 7-8 June 2012; Montreal, Canada. pp. 246-255.
  • [24] Mohammad SM, Kiritchenko S. Using hashtags to capture fine emotion categories from tweets. Comput Intell 2015; 31: 301-326.
  • [25] Vural AG, Cambazoglu BB, Senkul P, Tokgoz ZO. A framework for sentiment analysis in Turkish: application to polarity detection of movie reviews in Turkish. In: 27th International Symposium on Computer and Information Sciences; 3-4 October 2012; Paris, France. pp. 437-445.
  • [26] Akbas E. Aspect based opinion mining on Turkish tweets. MSc, Bilkent University, Ankara, Turkey, 2012.
  • [27] Bilgin O, Çetinolu Ö, Oflazer K. Building a wordnet for Turkish. Rom J Inform Sci & Technol 2004; 7: 163-172.
  • [28] Dehkharghani R, Saygin Y, Yanikoglu B, Oflazer K. Sentiturknet: a Turkish polarity lexicon for sentiment analysis. Lang Resour & Eval 2015; 50: 667-685.
  • [29] Ucan A, Naderalvojoud B, Sezer EA, Sever H. SentiWordNet for new language: Automatic translation approach. In: 12th International Conference on Signal-Image Technology & Internet-Based Systems; 28 November–1 December 2016; Naples, Italy. IEEE. pp.308-315.
  • [30] Sevindi BI. Comparison of supervised and dictionary based sentiment analysis approaches on Turkish text. MSc, Gazi University, Ankara, Turkey, 2013.
  • [31] Boynukalin Z. Emotion analysis of Turkish texts by using machine learning methods. MSc, Middle East Technical University, Ankara, Turkey, 2012.
  • [32] Scherer KR, Wallbott HG. Evidence for universality and cultural variation of differential emotion response patterning. J Pers & Soc Psychol 1994; 66: 310-328.
  • [33] Demirci S. Emotion analysis on Turkish tweets. MSc, Middle East Technical University, Ankara, Turkey, 2014.
  • [34] Açıçı E. Emotion Extraction from Turkish Text. Technical Report, Dokuz Eylul University, İzmir, Turkey, 2012.
  • [35] Tocoglu MA, Alpkocak A. TREMO: A dataset for emotion analysis in Turkish. J Inform Sci 2018.
  • [36] Manning CD, Raghavan P, Schütze H. An Introduction to Information Retrieval. Online ed. Cambridge, UK: Cambridge University Press, 2009.
  • [37] Akın AA, Akın MD. Zemberek, an open source NLP framework for Turkic languages. Struct 2007; 10: 1-5.
  • [38] Civriz M. Dictionary-based effective and efficient Turkish lemmatizer. MSc, Dokuz Eylul University, İzmir, Turkey, 2011.
  • [39] Luhn HP. The automatic creation of literature abstracts. IBM J Res & Dev 1958; 2: 159-165.
  • [40] Ozkarahan E. Database Machines and Database Management. New Jersey, USA: Prentice-Hall, 1986.
  • [41] Ozturkmenoglu O, Alpkocak A. Comparison of different lemmatization approaches for information retrieval on Turkish text collection. In: 2012 International Symposium on Innovations in Intelligent Systems and Applications; 2-4 July 2012; Trabzon, Turkey. IEEE. pp. 1-5.
  • [42] Zhang H, Gan W, Jiang B. Machine learning and lexicon based methods for sentiment classification: a survey. In: 11th Web Information System and Application Conference; 12-14 September 2014; Tianjin, China. IEEE. pp.262-265.