Danışmanlı ve yarı danışmanlı öğrenme kullanarak doküman vektörleri tabanlı tweetlerin duygu analizi

İnternetin günlük hayatımızdaki artan kullanımı ile beraber sosyal medya organlarının gelişimi de paralellik göstermektedir. Mikroblog adı verilen facebook ve twitter benzeri uygulamaları ile anlık duyguları ve düşünceleri ifade etmek son derece yaygın bir hale gelmiştir. Mikroblog sitelerinin en yaygın kullanıma sahip olanlarından birisi de Twitter uygulamasıdır. Twitter üzerinden paylaşılan mesajlar bir ürün ya da hizmet hakkında olabileceği gibi bir kişiyle ilgili bir yorumda olabilmektedir. Yapılan yorumun belirtmek istediği anlamı ve duyguyu belirleyebilmek son dönemdeki gözde konulardan biridir. Bir ürün ya da hizmet hakkında yapılan binlerce yorumun tek tek okunup anlamlandırılması ve yorumlayanların fikirlerinin sınıflandırılması geleneksel yöntemlerde oldukça zaman ve emek alan bir alandır. Gerek makine öğrenmesi ve derin öğrenme algoritmalarındaki gelişmeler gerekse de bunları işleyip yorumlayacak bilgisayar sistemlerinin gelişimine parallel olarak milyonlarca veri üzerinde duygu sınıflandırılması mümkün hale gelmiştir. Gerçekleştirdiğimiz çalışmada Türkçe ve İngilizce tivitler üzerinde duygusal sınıflandırma çalışması gerçekleştirilmiştir. Döküman vektörleri (Doc2Vec) kullanılarak yapılan çalışmada hem DBoW ve DM gibi iki farklı döküman vektörü yönteminin çalışması hemde Yarı Danışmanlı ve Danışmanlı öğrenmenin etkileri araştırılmıştır. Çalışma sonuçları doğruluk, kesinlik, anma, özgünlük ve F-ölçütü metrikleri ile raporlanmıştır. Gerçekleştirilen çalışma sonucunda Yarı Danışmanlı öğrenme yöntemi hem Türkçe hemde İngilizce veri kümesinde Danışmanlı öğrenmeye göre daha başarılı sonuçlar elde etmiştir.

Sentiment analysis of tweets based on document vectors using supervised learning and semi-supervised learning

The increasing presence of the Internet in the daily life leads to proliferation of social media. Microblogging applications such as Facebook and Twitter let their users to express their feelings and emotions in a short form of digital content including text, picture, and video. Twitter is one of the most popular microblogging applications. Tweets posted on twitter may contain positive or negative comments on a product, service or an individual. The analysis of the sentiments behind the tweets has been a popular research topic recently. Considering the time required to read and analyze a vast amount of tweets regarding a given product or service, manual classification of tweets is a complex and time consuming process. Recent machine learning and deep learning algorithms and the developments in the computer systems enable classification of the huge amount of data according to their sentiments in parallel. In this study, we have carried out sentiment analysis on tweets with both Turkish and English content. We have employed Document Vectors (Doc2Vec) along with DBoW and DM and analyzed the performance of semi-supervised and supervised learning methods. We have reported the performance in terms of accuracy, precision, recall, specificity and F-score metrics. We have concluded that semi-supervised learning outperforms supervised learning for both Turkish and English datasets.

___

  • [1] Go, A., Huang, L., and Bhayani, R., Twitter sentiment analysis, Entropy, 17, (2009).
  • [2] Liu, B. and Lei, Z., Mining Text Data: A survey of opinion mining and sentiment analysis, Mining Text Data, Springer, USA, pp. 415-463. ISBN: 978- 1-4614-3223-4.Prabowo, R. and Thelwall, M., "Sentiment analysis: A combined approach", Journal of Informetrics, 3(2), 143-157, (2009).
  • [3] Akgül, E.S., Ertano, C. ve Diri, B., Twitter verileri ile duygu analizi, Pamukkale University Journal of Engineering Sciences, 22(2), 106-110, (2016).
  • [4] Szomszor, M.N., Kostkova, P. and Quincey, E.D., # Swineflu: Twitter predicts swine flu outbreak in 2009, 3rd International ICST Conference on Electronic Healthcare for the 21st Century. 2012.
  • [5] Bian J, Topaloglu, U, Yu, F., Towards large-scale Twitter mining for drugrelated adverse events, International Workshop on Smart Health and Wellbeing (SHB’12), Maui, Hawaii, USA, 29 October-2 November 2012.
  • [6] Nguyen, L.E., Wu, P., Chan, W., Peng, W., Zhang, Y., Predicting collective sentiment dynamics from time-series social media, Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM ’12), Beijing, China, 12 August 2012.
  • [7] Claster, W.B., Dinh, H., Cooper, M., Naive bayes and unsupervised artificial neural nets for Cancun tourismsocial media data analysis, 2nd World Congress on Nature and Biologically Inspired Computing (NaBIC). Kitakyushu, Fukuoka, Japan, 15-17 December 2010.
  • [8] Liu, Y., Huang, X., An, A., Yu, X., ARSA: A sentiment awaremodel for predicting sales performance using blogs, 30th ACM SIGIR International Conference on Research and Development in Information Retrieval, Amsterdam, the Netherlands, 23-27 July 2007.
  • [9] Asur, S., Huberman, B.A., Predicting the Future with Social Media, IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), Toronto, ON, Canada, 31 August-3 September 2010.
  • [10] Joshi, M., Das, D., Gimpel, K., Smith, N.A., Movie reviews and revenues: an experiment in text regression, Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Los Angeles, CA, USA, 1-6 June 2010.
  • [11] Bollen, J., Mao, H., Zeng, X., Twitter mood predicts the stock market, Journal of Computational Science, 2(1), 1-8, (2011).
  • [12] Pang, B., Lee, L. and Vaithyanathan S., Thumbs up?: sentiment classification using machine learning techniques, Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002.
  • [13] Pang, B., and Lillian, L., A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004.
  • [14] Whitelaw, C., Garg, N., Argamon, S., Using appraisal groups for sentiment analysis, 14th ACM International Conference on Information and Knowledge Management (CIKM), Bremen, Germany, 31 October-5 November 2005.
  • [15] Yassenalina, A., Yue, Y., Cardie, C., Multi-Level structured models for document-level sentiment classification, Conference on Empirical Methods in Natural Language Processing (EMNLP), Boston, MA, USA, 9-11 October 2010.
  • [16] Matsumoto, S., Takamura, H., Okumura, M., Sentiment classification using word sub-sequences and dependency sub-trees, 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hanoi, Vietnam, 18-20 May 2005.
  • [17] Tan, S., Zhang, J., An empirical study of sentiment analysis for Chinese document, Expert Systems with Applications, 34(4), 2622-2629, (2008).
  • [18] Qui, G., He, X., Zhang, F., Shi, Y., Bu, J., Chen, C., DASA: dissatisfactionoriented advertising based on sentiment analysis, Expert Systems with Application, 37(9), 6182-6191, (2010).
  • [19] Bai, X., Predicting consumer sentiments from online text, Decision Support Systems, 50(4), 732-742, (2011).
  • [20] Chen, C.C., Tseng, Y.D., Quality evaluation of product reviews using an information quality framework, Decision Support Systems, 50(4), 755-768, (2011).
  • [21] Xia, R., Zong, C., Li, S., Ensemble of feature sets and classification algorithms, Information Sciences, 181(6), 1138-1152, (2011).
  • [22] Kang, H., Yoo, S.J., Han, M., Senti-Lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews, Expert Systems with Applications, 39(5), 6000-6010, (2012).
  • [23] Li, Y.M., Li, T.Y., Deriving Market intelligence from microblogs, Decision Support Systems, 55(1), 206-217, (2013).
  • [24] Moraes, R., Valiati, J.F., Neto WPG, Document-Level sentiment classification: an empirical comparison between SVM and ANN, Expert Systems with Applications, 40(2), 621-633, (2013).
  • [25] Wang, G., Sun, J., Ma, J., Xu, K., Gu, J., Sentiment classification: the contribution of ensemble learning, Decision Support Systems, 57, 77-93, (2014).
  • [26] Chalothom, T., Ellman, J., Simple Approaches of Sentiment Analysis via Ensemble Learning, Editor: Kim KJ. Information Science and Applications, 631-639, Berlin, Germany, Springer, 2015.
  • [27] Zheng, L., Wang, H., Gao, S., Sentimental feature selection for sentiment analysis of Chinese online reviews, International Journal of Machine Learning and Cybernetics, 1-10, (2015).
  • [28] Aue, A., Gamon, M., Customizing sentiment classifiers to new domains: a case study, International Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 21-23 September 2005.
  • [29] Tan, S., Wu, G., Tang, H., Cheng, X., A novel scheme for domain-transfer problem in the context of sentiment analysis, Conference on Information and Knowledge Management (CIKM), Lisbon, Portugal, 6-10 November 2007.
  • [30] Blitzer, J., Dredze, M., Pereira, F., Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, 25-27 June 2007.
  • [31] Mihalcea, R., Banae, C., Wiebe, J., Learning multilingual subjective language via cross-lingual projections, 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, 25-27 June 2007.
  • [32] Li, S., Zong, C., Multi-Domain sentiment classification, 46th Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH, USA, 19-20 June 2008.
  • [33] Banae, C., Mihalcea, R., Wiebe, J., Multilingual subjectivity analysis using machine translation, Conference on Empirical Methods in Natural Language Processing (EMNLP), Honolulu, HI, USA, 25-27 October 2008.
  • [34] Dasgupta, S., Ng, V., Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification, 47th Annual Meeting of the Association for Computational Linguistics (ACL), Suntec, Singapore, 2-7 August 2009.
  • [35] Wan, X., Co-training for cross-lingual sentiment classification, 47th Annual Meeting of the Association for Computational Linguistics (ACL), Suntec, Singapore, 2-7 August 2009.
  • [36] He, Y., Zhou, D., Self-Training from labelled features for sentiment analysis, Information Processing and Management, 47(4), 606-616, (2011).
  • [37] Hernandez, O.J., Rodriguez, J.D., Alzate, L., Lucania, M., Inza, I., Lozano, J.A., Approaching sentiment analysis by using semi-supervised learning of multidimensional classifiers, Neurocomputing, 92, 98-115, (2012).
  • [38] Hajmohammadi, M.S., Ibrahim, R., Selamat, A., Bi-View semi-supervised active learning for cross-lingual sentiment classification, Information Processing and Management, 50(5), 718-732, (2014).
  • [39] Hajmohammadi, M.S., Ibrahim, R., Selamat, A., Cross-Lingual sentiment classification using multiple source languages in multi-view semi-supervised learning, Engineering Applications of Artificial Intelligence, 36, 195-203, (2014).
  • [40] Hajmohammadi, M.S., Ibrahim, R., Selamat, A., Fujita, H., Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples, Information Sciences, 317, 67-77, (2015).
  • [41] Eroğul, U., Sentiment Analysis in Turkish. MSc Thesis, Middle East Technical University, Ankara, Turkey, 2009.
  • [42] Vural, A.G., Cambazoğlu BB, Şenkul P, Tokgöz ZO., A frame work for sentiment analysis in Turkish: Application to polarity detection of movie reviews in Turkish, 27th International Symposium on Computer and Information Sciences, Paris, France, 3-4 October 2012.
  • [43] Meral, M., Diri, B., Twitter üzerinde duygu analizi, IEEE 22. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, Trabzon, Türkiye, 23-25 Nisan 2014.
  • [44] Şimşek, M., Özdemir, S., Analysis of the relation between Turkish twitter messages and stock market index, 6th International Conference on Application of Information and Communication Technologies (AICT), Tbilisi, Georgia, 17- 19 October 2012.
  • [45] Türkmenoğlu, C., Tantuğ, A.C., Sentiment analysis in Turkish media, Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM ’14), Beijing, China, 21-26 June 2014.
  • [46] Nilsson, N.J., Introduction to machine learning: An early draft of a proposed textbook, (1996).
  • [47] Chao, W-L., Machine Learning Tutorial, Disp. Ee. Ntu. Edu. Tw, (2011).
  • [48] Caruana, R., and Niculescu-Mizil, A., An empirical comparison of supervised learning algorithms, Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
  • [49] Sebastiani, F., Machine learning in automated text categorization, ACM computing surveys (CSUR), 34.1, 2002.
  • [50] Bilgin, M., Makine Öğrenmesi,. Papatya Yayincilik, Istanbul, (2018).
  • [51] Witten, I.H., et al., Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2016.
  • [52] Çetin, M., and Amasyalı, M.F., Supervised and Traditional Term Weighting Methods for Sentiment Analysis., Sinyal İşleme Kurultayı, (2013)
  • [53] Airline Twitter Sentiment, https://www.crowdflower.com/data-for-everyone/, Online: April 2017.
  • [54] Kesim, M., Real time measurement of micro changes in dinamic images, Msc. Thesis, Karadeniz Technical University, Trabzon, Turkey, (2015).
Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi-Cover
  • ISSN: 1301-7985
  • Yayın Aralığı: Yılda 2 Sayı
  • Başlangıç: 1999
  • Yayıncı: Balıkesir Üniversitesi