Türkçe Tweetler için Derin Özellik Çıkarımı Tabanlı Yeni Bir Duygu Sınıflandırma Modeli

Sosyal medya uygulamaların yaygın kullanımı, insanları her dakika yeni veri üretmelerine neden olmuştur. Ses ve resim veri türlerinin yanında metin tabanlı verilerin boyutu daha hızlı artmaktadır. Metin tabanlı veriler, anlamlı kelimeler haricinde birçok içerik barındırabilmektedir. Metin işleme çalışmaları için bu içerikler gürültü olarak isimlendirilir ve metin önişleme aşamasında bu içerikler veri kümelerinden çıkartılır. Özellikle Twitter veri kümeleri üzerinde yapılan duygu sınıflandırma çalışmalarında, veri kümeleri metin temsilleri oluşturulmadan önce yapılan önişlemler aşamasında URL, noktalama işareti ve emoji gibi içeriklerden arındırılmaktadır. Twitter sosyal medya uygulaması için gürültü olarak nitelendirilen içerikler aslında bir bakıma kullanıcının duygu ve düşüncelerinin bir parçası niteliğindedir. Bu çalışmada veri kümesinden temizlenen gürültü verilerinden de özellik çıkarımı yapılmış olup, tweet’ler içerisindeki duygu daha iyi ortaya çıkarılmıştır. Çalışmada önerilen yeni duygu sınıflandırma modeli, derin öğrenme yöntemleriyle çıkartılan derin özellikler ile veri önişlemleri aşamasında silinen içeriklerden elle çıkartılan özellikleri birlikte kullanımına dayanmaktadır. Önerilen model literatürde çalışılan Türkçe Twitter veri kümesi üzerinde gerçekleştirilmiştir. Önerilen modelin sınıflandırma performansının önceki çalışmalardan daha iyi olduğu yapılan deneylerle gösterilmiştir.

A Novel Sentiment Classification Model Based on Deep Feature Extraction for Turkish Tweets

The widespread use of social media applications has induced a significant growth of data generation per person. Compared to audio and image-based data, the size of text-based data increases drastically. Besides meaningful words, text-based data may contain contents like punctuation marks (i.e. comma, period, exclamation mark, semicolon, etc.), emojis, URLs, etc. Aforementioned contents are called as noise for text-processing, and they are removed from the dataset at pre-processing phase. Especially in sentiment classification studies on Twitter data sets, data sets are purified from contents such as URLs, punctuations, and emojis at pre-processing phase before word embeddings are created. However, the content called as noise in Twitter social media application, might actually be the part of a user’s sentiment or thought. In this study, we focused on the feature extraction from the noise data that is removed by dataset and the emotion in the tweet was revealed better. We also proposed a novel sentiment classification model aims to use the deep-features extracted by deep learning methods as well as other features extracted manually from the content, which is removed at preprocessing phase of the data. The proposed model was studied on Turkish Twitter dataset. In the experiments, the classification performance of the proposed model based on the same dataset was better than the previous studies.

___

  • L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment analysis in social media,” Knowl. Inf. Syst., vol. 60, no. 2, pp. 617–663, 2019.
  • N. K. Singh, D. S. Tomar, and A. K. Sangaiah, “Sentiment analysis: a review and comparative analysis over social media,” J. Ambient Intell. Humaniz. Comput., vol. 11, no. 1, pp. 97–117, 2020.
  • J. Zhao and X. Cao, “Combining semantic and prior polarity for boosting twitter sentiment analysis,” in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, pp. 832–837.
  • H. M. Wallach, “Topic Modeling : Beyond Bag-of-Words,” in Proceedings of the 23rd International Conference on Machine learning, 2006, pp. 977–984.
  • Juan Ramos, “Using Tf-Idf to Determine Word Relevance in Document Queries,” in Proceedings of the first instructional conference on machine learning, 2003, pp. 29–48.
  • K. Gimpel et al., “Part-of-speech tagging for twitter: Annotation, features, and experiments,” Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol., pp. 42–47, 2011.
  • W. B. C. Cavnar and J. M. Trenkle, “N-Gram-Based Text Categorization,” in Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, 1994.
  • O. Coban, B. Ozyer, and G. T. Ozyer, “Türkçe Twitter Mesajlarinin Duygu Analizi,” in 2015 23rd Signal Processing and Communications Applications Conference, SIU 2015 - Proceedings, 2015, pp. 2388–2391.
  • U. A. Siddiqua, T. Ahsan, and A. N. Chy, “Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog,” in 19th International Conference on Computer and Information Technology, ICCIT 2016, 2016, pp. 304–309.
  • N. Chamansingh and P. Hosein, “Efficient sentiment classification of Twitter feeds,” in 2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA), 2016, pp. 78–82.
  • Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017.
  • A. Z. Riyadh, N. Alvi, and K. H. Talukder, “Exploring human emotion via Twitter,” in 20th International Conference of Computer and Information Technology, ICCIT 2017, 2017, pp. 1–5.
  • N. Tsapatsoulis and C. Djouvas, “Feature extraction for tweet classification: Do the humans perform better?,” in Proceedings - 12th International Workshop on Semantic and Social Media Adaptation and Personalization, SMAP 2017, 2017, pp. 53–58.
  • S. Kaur, G. Sikka, and L. K. Awasthi, “Sentiment Analysis Approach Based on N-gram and KNN Classifier,” in 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 2018, pp. 13–16.
  • R. I. Permatasari, M. A. Fauzi, P. P. Adikara, and E. D. L. Sari, “Twitter Sentiment Analysis of Movie Reviews using Ensemble Features Based Naïve Bayes,” in 3rd International Conference on Sustainable Information Engineering and Technology, SIET 2018, 2018, pp. 92–95.
  • İ. Aydın, M. U. Salur, and B. Fatma, “Duygu Analizi için Çoklu Populasyon Tabanlı Parçacık Sürü Optimizasyonu,” Türkiye Bilişim Vakfı Bilgi. Bilim. ve Mühendisliği Derg., vol. 11, no. 1, pp. 52–64, 2018.
  • M. U. Salur, I. Aydin, and S. A. Alghrsi, “SmartSenti: A Twitter-Based Sentiment Analysis System for the Smart Tourism in Turkey,” in 2019 International Conference on Artificial Intelligence and Data Processing Symposium, IDAP 2019, 2019, pp. 1–5.
  • R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019.
  • S. E. Saad and J. Yang, “Twitter Sentiment Analysis Based on Ordinal Regression,” IEEE Access, vol. 7, pp. 163677–163685, 2019.
  • A. A. Karcioglu and T. Aydin, “Sentiment Analysis of Turkish and English Twitter Feeds Using Word2Vec Model,” in 27th Signal Processing and Communications Applications Conference, SIU 2019, 2019, pp. 1–4.
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” Adv. Neural Inf. Process. Syst., pp. 3111–3119, 2013.
  • J. Pennington, R. Socher, and C. D. Manning, “GloVe : Global Vectors for Word Representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, 2017.
  • M. U. Salur and İ. Aydın, “Derin Öğrenme Tabanlı Duygu Sınıflandırma,” in 26th Signal Processing and Communications Applications Conference (SIU), 2018.
  • W. Meng, Y. Wei, P. Liu, Z. Zhu, and H. Yin, “Aspect Based Sentiment Analysis with Feature Enhanced Attention CNN-BiLSTM,” IEEE Access, vol. 7, pp. 167240–167249, 2019.
  • S. Kamiş and D. Goularas, “Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data,” in 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications, Deep-ML 2019, 2019, pp. 12–17.
  • M. U. Salur and I. Aydin, “A Novel Hybrid Deep Learning Model for Sentiment Classification,” IEEE Access, vol. 8, pp. 58080–58093, 2020.
  • A. Muslim, A. B. Mutiara, R. Refianti, C. M. Karyati, and G. Setiawan, “Comparison of accuracy between long short-term memory-deep learning and multinomial logistic regression-machine learning in sentiment analysis on twitter,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, pp. 747–754, 2020.
  • P. Kaladevi and K. Thyagarajah, “Integrated CNN- and LSTM-DNN-based sentiment analysis over big social data for opinion mining,” Behav. Inf. Technol., vol. 0, no. 0, pp. 1–9, 2019.
  • R. C. Staudemeyer and E. R. Morris, “Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks,” arXiv Prepr. arXiv1909.09586., pp. 1–42, 2019.
  • F. Chollet, Deep learning with Python. Shelter Island: Manning Publications, 2017.
  • N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: A comparative study,” Electron., vol. 9, no. 3, 2020.
  • J. Zhao, “Pre-processing boosting twitter sentiment analysis?,” in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 2015, pp. 748–753.
  • M. U. Salur and I. Aydin, “The Impact of Preprocessing on Classification Performance in Convolutional Neural Networks for Turkish Text,” in 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018, 2019.
  • A. A. Akın and M. D. Akın, “Zemberek, An Open Source Nlp Framework for Turkic Languages,” Structure, vol. 10, pp. 1–5, 2007.
  • P. K. Novak, J. Smailović, B. Sluban, and I. Mozetič, “Sentiment of emojis,” PLoS One, vol. 10, no. 12, pp. 1–22, 2015.
Fırat Üniversitesi Mühendislik Bilimleri Dergisi-Cover
  • ISSN: 1308-9072
  • Yayın Aralığı: Yılda 2 Sayı
  • Başlangıç: 1987
  • Yayıncı: FIRAT ÜNİVERSİTESİ