Apache Spark Tabanlı Duygu Analizi
Bu çalışmada, büyük verileri bellek içi hesaplama yöntemi ile hızlı bir şekilde işleyebilen Apache Spark açık kaynak kodlu çerçeve kullanılarak duygu analizi gerçekleştirilmiştir. Duygu analizi işleminde Spark içerisinde bulunan MLlib makine öğrenimi kütüphanesi kullanılmıştır. Lojistik regresyon (LR), destek vektör makinesi (SVM) ve Naive Bayes (NB) makine öğrenmesi sınıflandırma algoritmaları kullanılmıştır. Çalışmada, duygu analizinde kullanılan makine öğrenmesi algoritmaları doğruluk, kesinlik ve duyarlılık performanslarına göre değerlendirilmektedir. Sonuçlar, SVM algoritmasının çalışmada kullanılan iki farklı veri setinde de sırasıyla %91, %88 doğruluk, %91, %90 kesinlik ve %91, %87 duyarlılık değerleri ile en iyi performansa sahip olduğunu göstermektedir.
Apache Spark Based Sentiment Analysis
In this study, sentiment analysis is carried out using the Apache Spark open source framework, which is capable of processing big data quickly with the method of computing in memory. MLlib machine learning library in Spark is used in the sentiment analysis process. Logistic regression (LR), support vector machine (SVM) and Naive Bayes (NB) machine learning classification algorithms were used. In the study, machine learning algorithms used in sentiment analysis are evaluated according to their accuracy, precision and sensitivity performances. The results show that the SVM algorithm has the best performance in the two different data sets used in the study, with 91%, 88% accuracy, 91%, 90% precision and 91%, 87% sensitivity, respectively.
___
- Goel A., Gautam J., Kumar S. Real time sentiment analysis of tweets using naive bayes. Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies (Ngct) 2016; 257- 261.
- Hosmer DW., Lemeshow S., Sturdivant RX. Applied logistic regression third edition preface. Applied Logistic Regression, 3rd Edition, p. Xiii-+2013.
- Jain AP., Dandannavar P. Application of machine learning techniques to sentiment analysis. Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (Icatcct), p. 628-632.
- Kaggle. Yelp review sentiment dataset. 2021a. https://www.kaggle.com/ilhamfp31/yelp-review-dataset. (Erişim tarihi: 19.02.2021)
- Kaggle. IMDB Dataset of 50K Movie Reviews, 2021b. https://www.kaggle.com/lakshmi25npathi/imdbdataset-of-50k-movie-reviews. (Erişim tarihi: 19.02.2021).
- Kouloumpis E., Wilson T., Moore J. Twitter sentiment analysis: The good the bad and the omg!, In Proceedings of the International AAAI Conference on Web and Social Media 2011; 5(1): 538-541.
- Meng XR., Bradley J., Yavuz B., Sparks E., Venkataraman S., Liu D., Freeman J., Tsai DB., Amde M., Owen S., Xin D., Xin R., Franklin MJ., Zadeh R., Zaharia M., Talwalkar A. MLlib: Machine learning in apache spark. Journal of Machine Learning Research 2016; 17(1): 1235-1241.
- Mohapatra S., Ahmed N., Alencar P. KryptoOracle: A real-time cryptocurrency price prediction platform using twitter sentiments. 2019 IEEE International Conference on Big Data (Big Data), p. 5544-5551.
- Neethu MS., Rajasree R. Sentiment analysis in twitter using machine learning techniques. 2013 Fourth International Conference on Computing, Communications and Networking Technologies (Icccnt), pp: 1-5.
- Pang B., Lee L., Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. Appears in Proc. 2002 Conf. on Empirical Methods in Natural Language Processing (EMNLP).arXiv preprint cs/0205070.
- Wang H., Can D., Kazemzadeh A., Bar F., Narayanan S. A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In Proceedings of the ACL 2012 system demonstrations. 2012, 115-120.
- Zaharia M., Chowdhury M., Das T., Dave A., Ma J., McCauley M., Franklin MJ., Shenker S., Stoica I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th {USENIX} Symposium on Networked Systems Design and Implementation 2012; 12, 15-28.
- Zhu GB., Blumberg DG. Classification using ASTER data and SVM algorithms: The case study of Beer Sheva, Israel. Remote Sensing of Environment 2002; 80(2): 233-240.