Apache Spark Tabanlı Duygu Analizi

Bu çalışmada, büyük verileri bellek içi hesaplama yöntemi ile hızlı bir şekilde işleyebilen Apache Spark açık kaynak kodlu çerçeve kullanılarak duygu analizi gerçekleştirilmiştir. Duygu analizi işleminde Spark içerisinde bulunan MLlib makine öğrenimi kütüphanesi kullanılmıştır. Lojistik regresyon (LR), destek vektör makinesi (DVM) ve Naive Bayes sınıflandırma algoritmalarının kullanıldığı bu çalışmada, algoritmaların farklı ölçütlere göre performans değerlendirmeleri yapılmaktadır.

Apache Spark Based Sentiment Analysis

In this study, emotion analysis is carried out using the Apache Spark open source framework, which is capable of processing big data quickly with the method of computing in memory. MLlib machine learning library in Spark was used in the sentiment analysis process. Logistic regression (LR), support vector machine (DVM), and Naive Bayes classification algorithms are used for performance evaluations according to different criteria.

___

  • Mohapatra, S., Ahmed, N., and Alencar, P., KryptoOracle: A Real-Time Cryptocurrency Price Prediction Platform Using Twitter Sentiments. 2019 IEEE International Conference on Big Data (Big Data), p. 5544-5551.2019.
  • Kouloumpis, E., Wilson, T. and Moore, J., Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the International AAAI Conference on Web and Social Media. 2011.
  • Pang, B., Lee, L. And Vaithyanathan, S., Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070, 2002.
  • Wang, H., Can, D., Kazemzadeh, A., Bar, F., and Narayanan, S., A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. in Proceedings of the ACL 2012 system demonstrations. 2012.
  • Neethu, M.S. and Rajasree, R., Sentiment Analysis in Twitter using Machine Learning Techniques. 2013 Fourth International Conference on Computing, Communications and Networking Technologies (Icccnt), 2013.
  • Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S. and Stoica, I., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th {USENIX} Symposium on Networked Systems Design and Implementation, 12, 15-28.2012.
  • Meng, X.R., et al., MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research, 2016. 17.
  • Yelp Review Sentiment Dataset, 2021. https://www.kaggle.com/ilhamfp31/yelp-review-dataset. (Erişim tarihi: 19.02.2021).
  • IMDB Dataset of 50K Movie Reviews, 2021. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews. (Erişim tarihi: 19.02.2021)
  • Goel, A., Gautam, J. and Kumar, S., Real Time Sentiment Analysis of Tweets Using Naive Bayes. Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies (Ngct), p. 257-261.2016.
  • Jain, A.P. and Dandannavar, P., Application of Machine Learning Techniques to Sentiment Analysis. Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (Icatcct),p. 628-632.2016.
  • Hosmer, D.W., Lemeshow, S. and Sturdivant, R.X., Applied Logistic Regression Third Edition Preface. Applied Logistic Regression, 3rd Edition, p. Xiii-+.2013.
  • Zhu, G.B. and D.G. Blumberg, Classification using ASTER data and SVM algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,80(2): p. 233-240. 2002.
Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi-Cover
  • ISSN: 2687-3729
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 2018
  • Yayıncı: Osmaniye Korkut Ata Üniversitesi