Duygu analizi için n-aşamalı Gizli Dirichlet Ayırımı ile diğer konu modelleme yöntemlerinin karşılaştırılması

Sosyal medyada yer alan paylaşımlara ait duyguları anlamak, insanların düşüncelerini öğrenmede kilit bir rol almaktadır. Gelişen teknoloji ile insanın duygusunu bilmek, çeşitli alanlarda yarar sağlamaktadır. Örneğin medya, pazarlama ve reklam gibi alanlar insanların kullanımlarına ve fikrine özgü içerikleri kullanıcıya yansıtabilme imkanı tanımaktadır. Çalışmamızda konu modelleme algoritmalarından Gizli Dirichlet Ayırımı (GDA), Gizli Anlamsal Analiz (GAA) ve Olasılıksal-Gizli Anlamsal Analiz (O-GAA) Türkçe tivitlerden kişilerin duygularını belirlemede kullanılmıştır. Ayrıca, GDA algoritmasının geliştirilen n-aşamalı halinin duygu analizindeki başarısı mevcut yöntemlerle de karşılaştırılmıştır. Kullanılan veri seti kızgın, korku, mutlu, üzgün ve şaşkın olmak üzere 5 farklı duyguya ait 4000 tivitten oluşmaktadır. Tüm konu modelleme yöntemleri 3 ve 5 sınıflı veri seti için modellenerek başarıları ve çalışma süreleri ölçülmüştür. Geliştirilen n-aşamalı GDA yönteminin, GDA ve O-GAA’ya göre çalışma süresi ve performansı açısından başarı sağladığı gözlemlenmiştir. En başarılı ve en hızlı modellenen yöntem ise GAA olmuştur.

Comparison of n-stage latent dirichlet allocation versus other topic modeling methods for emotion analysis

Understanding the emotions of sharing in social media plays a key role in learning people's thoughts. Knowing the emotion of human being with developing technology provides benefit in various fields. For example, media, marketing and advertising areas allow people to reflect on their use and idea specific content. In our study, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Probabilistic-Latent Semantic Analysis (P-LSA) were used to determine the emotions of individuals from Turkish tweets. In addition, the success of the developed n-stage state of the LDA algorithm in the emotion analysis was compared with the existing methods. The dataset consists of 4000 tweets of 5 different emotions, including angry, fear, happiness, sadness and surprise. All topic modeling methods were modeled for 3 and 5 class datasets and their successes and running times were measured. It has been observed that the developed n-stage LDA method achieves success in terms of running time and performance according to LDA and P-LSA. The most successful and fastest modeled method was LSA.

___

  • 1. S. Lee, J. Baker, J. Song, J. C. Wetherbe, An Empirical Comparison of Four Text Mining Methods, 2010 43rd Hawaii International Conference on System Sciences, 5-8 Ocak, 2010.
  • 2. Z. A. Guven, B. Diri, T. Cakaloglu, Classification of New Titles by Two Stage Latent Dirichlet Allocation, 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), 4-6 Ekim, 2018.
  • 3. M. A. Haidar, D. Oshaughnessy, Comparison of a bigram PLSA and a novel context-based PLSA language model for speech recognition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
  • 4. J. Mazarura, A. D. Waal, A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text, 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 2016.
  • 5. D. M. Blei, A. Y. Ng, M. I. Jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, 993-1022, 2003.
  • 6. T. Kakkonen, N. Myller, E. Sutinen, J. Timonen, Comparison of Dimension Reduction Methods for Automated Essay Grading, Journal of Educational Technology & Society, 11(3), 275–288, 2008.
  • 7. Y. Lu, Q. Mei, C. Zhai, Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, 14(2), 178–203, 2010.
  • 8. J.-T. Chien, C.-H. Chueh, Latent dirichlet language model for speech recognition, 2008 IEEE Spoken Language Technology Workshop, 2008.
  • 9. L.-C. Chen, An effective LDA-based time topic model to improve blog search performance, Information Processing & Management, 53(6), 1299–1319, 2017.
  • 10. S. Xiong, K. Wang, D. Ji, B. Wang, A short text sentiment-topic model for product reviews, Neurocomputing, 297, 94–102, 2018.
  • 11. B. P. Eddy, N. A. Kraft, J. Gray, Impact of structural weighting on a latent Dirichlet allocation-based feature location technique, Journal of Software: Evolution and Process, 30(1), 2017.
  • 12. L. Bolelli, Ş. Ertekin, C. L. Giles, Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation, Lecture Notes in Computer Science Advances in Information Retrieval, 776–780, 2009.
  • 13. S. Momtazi, Unsupervised Latent Dirichlet Allocation for supervised question classification, Information Processing & Management, 54(3), 380–393, 2018.
  • 14. Z. A. Guven, B. Diri, T. Cakaloglu, Classification of Turkish Tweet emotions by n- stage Latent Dirichlet Allocation, 2018 Electric Electronics, Computer Science, Biomedical Engineerings Meeting (EBBT), 2018.
  • 15. Latent Dirichlet allocation. Wikipedia. http://www.wikizeroo.net/wiki/en/Latent_Dirichlet_allocation. Erişim tarihi Ekim 10, 2018.
  • 16. G. Slomovitz, Latent Semantic Analysis (LSA) for syslog correlation, 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP), 2017.
  • 17. Ryan J. O., A System for Computerized Analysis of Verbal Fluency Tests, Yüksek Lisans Tezi, University of Minnesota Twin Cities, Institute for Health Informatics Center for Cognitive Sciences, Minnesota, 2013.
  • 18. F. Wild, C. Stahl, Investigating Unstructured Texts with Latent Semantic Analysis, Studies in Classification, Data Analysis, and Knowledge Organization Advances in Data Analysis, 383–390, 2007.
  • 19. L. Hennig, Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis, International Conference RANLP 2009, 144–149, 2009.
  • 20. J. Xu, Topic Modeling with LSA, PSLA, LDA & lda2Vec. Medium.https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-lda2vec555f f65b0b05. Yayın tarihi May 25, 2018. Erişim tarihi Kasım 5, 2018.
  • 21. T. Hofmann, Probabilistic latent semantic indexing, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 99), 1999.
  • 22. Download zemberek-tr-2.1.jar:zemberek « z « Jar File Download. http://www.java2s.com/Code/Jar/z/Downloadzemberektr21jar.htm. Erişim tarihi Eylül 20, 2018.
Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi-Cover
  • ISSN: 1300-1884
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 1986
  • Yayıncı: Oğuzhan YILMAZ