Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. By using the multi-stages LDA, we were able to perform better (2-stages:70.5%, 3-stages:76.4%) than the state of the art result (60.4%) which was achieved using the plain LDA for 5 classes.

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. Using the multi-stage LDA (2-stages:70.5%, 3-stages:76.375%) method, the success rate was increased compared to normal LDA (60.375%) for 5 class.

___

  • [1] D. M. Blei, “Probabilistic topic models”, Communications of the ACM, vol. 55, no 4, pp. 77-84, April 2012.
  • [2] A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey”, Frontiers of Compute rScience in Chine, vol. 4, no 2, pp. 280-301, June 2010.
  • [3] M. Steyvers and T. Griffiths, “Probabilistic topic models”, Handbook of latent semantic analysis, vol. 427, no 7, pp. 424-440, February 2007.
  • [4] B. Liu and L. Zhang, “A survey of opinion mining and sentiment analysis”, Mining text data, pp. 415-463, 2012.
  • [5] O. Coban, B. Ozyer, and G. T. Ozyer, “Sentiment analysis for Turkish Twitter feeds,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), May 2015.
  • [6] H. Türkmen, S. I. Omurca, E. Ekinci,“An Aspect Based Sentiment Analysis on Turkish Hotel Reviews”, Girne American University Journal of Social and Applied Sciences, vol. 6, pp. 9-15, 2016.
  • [7] K. Roberts, M. Roach, J. Johnson, J. Guthrie, and S. Harabagiu, “EmpaTweet: Annotating and Detecting Emotions on Twitter”, In Proceedings of the 8th International Conference on Language Resourcesand Evaluation (LREC), May 2012.
  • [8] A. Çelikyılmaz, G. Tur, and D. Tur, “LDA Based Similarity Modeling for Question Answering”, Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pp. 1-9, May 2010.
  • [9] G. Tur, A. Celikyilmaz, and D. Hakkani-Tur, “Latent semantic modeling for slot filling in conversational understanding,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013.
  • [10] P. Paroubek and A. Pak, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, Proceedings of the International Conference on Language Resourcesand Evaluation, pp. 17-23, Malta, May 2010.
  • [11] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,” Proceeding of the 18th ACM conference on Information and knowledge management - CIKM 09, pp. 375–384, Nov. 2009.
  • [12] R. Chatterjee and S. Agarwal, “Twitter Truths: Authenticating Analysis of Information Credibility”, 2016 3rd International Conference on Computing for Sustainable Global Development, March 2016.
  • [13] A. Ratku, S. Feuerriegel, and D. Neumann, “Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation,” SSRN Electronic Journal, pp. 1072–1081, Jan. 2014.
  • [14] C. Strapparava and R. Mihalcea, “SemEval-2007 task 14,” Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval 07, pp. 70–74, Jun. 2007.
  • [15] F. Colace, M. D. Santo, and L. Greco, “A Probabilistic Approach to Tweets Sentiment Classification,” 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 37–42, Sep. 2013.
  • [16] A. Onan, “Türkçe Twitter Mesajlarında Gizli Dirichlet Tahsisine Dayalı Duygu Analizi”, Akademik Bilişim Konferansı, Feb. 2017.
  • [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, March 2003.
  • [18] L. Bolelli, Ş. Ertekin, and C. L. Giles, “Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation,” Lecture Notes in Computer Science Advances in Information Retrieval, pp. 776–780, Apr. 2009.
  • [19] J. Barber, “Latent Dirichlet Allocation (LDA) with Python,” Human Activity Recognition Using Smartphones Data Set. [Online]. Available: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html. [Accessed: 12-Sep-2017].
  • [20] wikizero.net. [Online]. Available: http://www.wikizero.net/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGF0ZW50X0RpcmljaGxldF9hbGxvY2F0aW9u. [Accessed: 20-Oct-2017].
  • [21] “Zemberek NLP,” Zemberek NLP. [Online]. Available: http://zembereknlp.blogspot.com/. [Accessed: 05-Oct-2017].
  • [22] “Download,” Snowball. [Online]. Available: http://snowball.tartarus.org/download.html. [Accessed: 16-Nov-2017].
Academic Platform Journal of Engineering and Smart Systems-Cover
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 2022
  • Yayıncı: Akademik Perspektif Derneği
Sayıdaki Diğer Makaleler

Elektrolitik Sert Metal Kaplamaya Alternatif Çevreye Duyarlı Ni-P Kaplamaların 6061 Serisi Alüminyum Altlık Üzerine Akımsız Yöntemle Biriktirilmesinde Sodyum hipofosfit, Sıcaklık ve Zamanın Etkisi

Abdülkadir AKYOL, Hasan ALGÜL, Oğuzhan BİLAÇ, Seda ULU, Harun GÜL, Mehmet UYSAL, Yusuf ÇAY, Ahmet ALP

Kompleks Sülfürlü Çinko Cevherlerinden Anodik Oksitleme ile Zn Kazanımına H2SO4 Derişimi ve Elektrotlar Arası Mesafenin Etkisi

Figen ÖZBOZ, Seda ULU, Abdülkadir AKYOL, Mehmet UYSAL, Harun GÜL, Ahmet ALP, Ali Osman AYDIN

Darbeli DC Sinterleme Sistemi Konteynerinin Soğumasına Yönelik Termal Devre Modeli

Tuba YENER, Suayb Cagri YENER, Reşat MUTLU

Modeling Relationship Maps for The Factors of Purchasing Management System According to Selected Key Drivers Using Fuzzy Cognitive Maps

Tuğba TUNACAN, Bayram TOPAL, Alper GÖKSU

Hava Kalite İndeksinin Tahmin Başarısının Artırılması için Topluluk Regresyon Algoritmalarının Kullanılması

Muhammet Emre IRMAK, İbrahim Berkan AYDİLEK

Soğuk İklim Bölgelerinde Bitki Topluluklarının Sıcaklık Etkileri: Erzurum Ata Botanik Bahçesi

Sevgi YILMAZ, Mehmet Akif IRMAK, Emral MUTLU, Hasan YILMAZ

Kinect Uygulamaları için Veri Transfer Platformu Tasarımı

Erdal ERDAL, Atilla Ergüzen

Ayasofya Müzesi Zemin Yüzey Deformasyonlarının Yeraltı Radarı (GPR) İle İncelenmesi

Efecan BELCE, Tolga BEKLER, Yunus Can KURBAN, Cahit Çağlar YALÇINER

FIR Filter Design Using Genetic Algorithm Implemented MATLAB and Asp.Net Based Web Educational Interface

Zeynep GARİP, Ali Fuat BOZ

Tedarik Zincirinde Kaos: Bir Literatür Taraması

Neslihan AÇIKGÖZ, Gültekin ÇAĞIL