Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. Using the multi-stage LDA (2-stages:70.5%, 3-stages:76.375%) method, the success rate was increased compared to normal LDA (60.375%) for 5 class. "> [PDF] Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets | [PDF] Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. Using the multi-stage LDA (2-stages:70.5%, 3-stages:76.375%) method, the success rate was increased compared to normal LDA (60.375%) for 5 class. ">

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets

Understanding the reason behind the emotions placed in the social media plays a key role to learn mood characterization of any written texts that are not seen before. Knowing how to classify the mood characterization leads this technology to be useful in a variety of fields. The Latent Dirichlet Allocation (LDA), a topic modeling algorithm, was used to determine which emotions the tweets on Twitter had in the study. The dataset consists of 4000 tweets that are categorized into 5 different emotions that are anger, fear, happiness, sadness, and surprise. Zemberek, Snowball, and first 5 letters root extraction methods are used to create models. The generated models were tested by using the proposed n-stage LDA method. With the proposed method, we aimed to increase model’s success rate by decreasing the number of words in the dictionary. Using the multi-stage LDA (2-stages:70.5%, 3-stages:76.375%) method, the success rate was increased compared to normal LDA (60.375%) for 5 class.

___

  • [1] D. M. Blei, “Probabilistic topic models”, Communications of the ACM, vol. 55, no 4, pp. 77-84, April 2012.
  • [2] A. Daud, J. Li, L. Zhou, and F. Muhammad, “Knowledge discovery through directed probabilistic topic models: a survey”, Frontiers of Compute rScience in Chine, vol. 4, no 2, pp. 280-301, June 2010.
  • [3] M. Steyvers and T. Griffiths, “Probabilistic topic models”, Handbook of latent semantic analysis, vol. 427, no 7, pp. 424-440, February 2007.
  • [4] B. Liu and L. Zhang, “A survey of opinion mining and sentiment analysis”, Mining text data, pp. 415-463, 2012.
  • [5] O. Coban, B. Ozyer, and G. T. Ozyer, “Sentiment analysis for Turkish Twitter feeds,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), May 2015.
  • [6] H. Türkmen, S. I. Omurca, E. Ekinci,“An Aspect Based Sentiment Analysis on Turkish Hotel Reviews”, Girne American University Journal of Social and Applied Sciences, vol. 6, pp. 9-15, 2016.
  • [7] K. Roberts, M. Roach, J. Johnson, J. Guthrie, and S. Harabagiu, “EmpaTweet: Annotating and Detecting Emotions on Twitter”, In Proceedings of the 8th International Conference on Language Resourcesand Evaluation (LREC), May 2012.
  • [8] A. Çelikyılmaz, G. Tur, and D. Tur, “LDA Based Similarity Modeling for Question Answering”, Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, pp. 1-9, May 2010.
  • [9] G. Tur, A. Celikyilmaz, and D. Hakkani-Tur, “Latent semantic modeling for slot filling in conversational understanding,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013.
  • [10] P. Paroubek and A. Pak, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, Proceedings of the International Conference on Language Resourcesand Evaluation, pp. 17-23, Malta, May 2010.
  • [11] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,” Proceeding of the 18th ACM conference on Information and knowledge management - CIKM 09, pp. 375–384, Nov. 2009.
  • [12] R. Chatterjee and S. Agarwal, “Twitter Truths: Authenticating Analysis of Information Credibility”, 2016 3rd International Conference on Computing for Sustainable Global Development, March 2016.
  • [13] A. Ratku, S. Feuerriegel, and D. Neumann, “Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation,” SSRN Electronic Journal, pp. 1072–1081, Jan. 2014.
  • [14] C. Strapparava and R. Mihalcea, “SemEval-2007 task 14,” Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval 07, pp. 70–74, Jun. 2007.
  • [15] F. Colace, M. D. Santo, and L. Greco, “A Probabilistic Approach to Tweets Sentiment Classification,” 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 37–42, Sep. 2013.
  • [16] A. Onan, “Türkçe Twitter Mesajlarında Gizli Dirichlet Tahsisine Dayalı Duygu Analizi”, Akademik Bilişim Konferansı, Feb. 2017.
  • [17] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, March 2003.
  • [18] L. Bolelli, Ş. Ertekin, and C. L. Giles, “Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation,” Lecture Notes in Computer Science Advances in Information Retrieval, pp. 776–780, Apr. 2009.
  • [19] Z. A. Guven, B. Diri, and T. Cakaloglu, “Classification of Turkish Tweet Emotions by n-stage Latent Dirichlet Allocation”, Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), Apr. 2018.
  • [20] Z. A. Guven, B. Diri, and T. Cakaloglu, “Classification of New Titles by Two Stage Latent Dirichlet Allocation”, 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), Oct. 2018.
  • [21] J. Barber, “Latent Dirichlet Allocation (LDA) with Python,” Human Activity Recognition Using Smartphones Data Set. [Online]. Available: https://rstudio-pubsstatic.s3.amazonaws.com/79360_850b2a69980c4488b1db9 5987a24867a.html. [Accessed: 12-Sep-2017].
  • [22] wikizero.net. [Online]. Available: http://www.wikizero.net/index.php?q=aHR0cHM6Ly9lbi53 aWtpcGVkaWEub3JnL3dpa2kvTGF0ZW50X0RpcmljaGxl dF9hbGxvY2F0aW9u. [Accessed: 20-Oct-2017].
  • [23] “Zemberek NLP,” Zemberek NLP. [Online]. Available: http://zembereknlp.blogspot.com/. [Accessed: 05-Oct-2017].
  • [24] “Download,” Snowball. [Online]. Available: http://snowball.tartarus.org/download.html. [Accessed: 16- Nov-2017].