Sentiment analysis of Twitter texts using Machine learning algorithms

Since the two last decades, social media networks have become a part of our daily life. Today, getting information from social media, tracking trends in social media, learning the feelings and emotions of people on social media is very essential. In this study, sentiment analysis was performed on Twitter text to learn about the subjective polarities of the writings. The polarities are positive, negative, and neutral. At the first stage of the sentiment analysis, a public data set has been obtained. Secondly, natural language processing techniques have been applied to make the data ready for machine learning training procedures. Lastly, sentiment analysis is performed by using three different machine learning algorithms. We reached 89% accuracy with Support Vector Machines, 88% accuracy with Random Forest, and 72% accuracy with Gaussian Naive Bayes classifier.

___

[1] Duncombe, Constance. "The politics of Twitter: emotions and the power of social media." International Political Sociology 13.4 (2019): 409-429.

[2] Akram, Waseem, and Rekesh Kumar. "A study on positive and negative effects of social media on society." International Journal of Computer Sciences and Engineering 5.10 (2017): 347-354.

[3] Ajjoub, Carl, Thomas Walker, and Yunfei Zhao. "Social media posts and stock returns: The Trump factor." International Journal of Managerial Finance (2020).

[4] Social Blade Organization, “Twitter Stats Summary,” User Statistics for RealDonalTrump. https://socialblade.com/twitter/user/realdonaldtrump (accessed Dec. 7, 2020).

[5] Wells, Chris, et al. "Trump, Twitter, and news media responsiveness: A media systems approach." New Media & Society 22.4 (2020): 659-682.

[6] Clarke, Isobelle, and Jack Grieve. "Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018." PloS one 14.9 (2019): e0222062.

[7] Yaqub, Ussama, et al. "Analysis of political discourse on twitter in the context of the 2016 US presidential elections." Government Information Quarterly 34.4 (2017): 613-626.

[8] Kaggle Data science Company, “Datasets,” Datasets. https://www.kaggle.com/austinreese/trump-tweets (accessed Nov.7, 2020).

[9] Kam, Ho Tin. "Random decision forest." Proceedings of the 3rd International Conference on Document Analysis and Recognition. Vol. 1416. Montreal, Canada, August, 1995.

[10] Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. Cutler, Adele, D. Richard Cutler, and John R. Stevens. "Random forests." Ensemble machine learning. Springer, Boston, MA, 2012. 157-175.

[11] Scikit-learn Software. https://scikitlearn.org/stable/modules/naive_bayes.html (accessed May 2, 2021)

[12] Syafie, Lukman, et al. "Comparison of Artificial Neural Network and Gaussian Naïve Bayes in Recognition of Hand-Writing Number." 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT). IEEE, 2018.

[13] Cortes, Corinna, and Vladimir Vapnik. "Supportvector networks." Machine learning 20.3 (1995): 273-297.

[14] Tharwat, Alaa. "Classification assessment methods." Applied Computing and Informatics (2020).

[15] Kulkarni, Ajay, Deri Chong, and Feras A. Batarseh. "Foundations of data imbalance and solutions for a data democracy." Data Democracy. Academic Press, 2020. 83- 106.

[16] Elbagir, Shihab, and Jing Yang. "Twitter sentiment analysis using natural language toolkit and VADER sentiment." Proceedings of the International MultiConference of Engineers and Computer Scientists. Vol. 122. 2019.

[17] Li, Irene, et al. "What Are We Depressed About When We Talk About COVID-19: Mental Health Analysis on Tweets Using Natural Language Processing." International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, Cham, 2020.

[18] Al-Makhadmeh, Zafer, and Amr Tolba. "Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach." Computing 102.2 (2020): 501-522.

[19] Vishwakarma, Dinesh Kumar, Deepika Varshney, and Ashima Yadav. "Detection and veracity analysis of fake news via scrapping and authenticating the web search." Cognitive Systems Research 58 (2019): 217-229.

[20] Back, Bong-Hyun, and Il-Kyu Ha. "Comparison of sentiment analysis from large Twitter datasets by Naïve Bayes and natural language processing methods." Journal of information and communication convergence engineering 17.4 (2019): 239-245.

[21] Jindal, Kanika, and Rajni Aron. "A systematic study of sentiment analysis for social media data." Materials Today: Proceedings (2021).

[22] Yaqub, Ussama. "Tweeting During the Covid-19 Pandemic: Sentiment Analysis of Twitter Messages by President Trump." Digital Government: Research and Practice 2.1 (2020): 1-7.

[23] Sahu, Kalyan, Yu Bai, and Yoonsuk Choi. "Supervised Sentiment Analysis of Twitter Handle of President Trump with Data Visualization Technique." 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2020.

[24] Asur, Sitaram, and Bernardo A. Huberman. "Predicting the future with social media." 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology. Vol. 1. IEEE, 2010.

[25] Kaggle Data science Company, “Datasets,” Datasets. https://www.kaggle.com/austinreese/trump-tweets (accessed Nov.7, 2020).

[26] Ruz, Gonzalo A., Pablo A. Henríquez, and Aldo Mascareño. "Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers." Future Generation Computer Systems 106 (2020): 92-104.

[27] Patro, V. M., and M. R. Patra. “A Novel Approach to Compute Confusion Matrix for Classification of N-Class Attributes with Feature Selection”. Transactions on Machine Learning and Artificial Intelligence, Vol. 3, no. 2, May 2015, p. 52, doi:10.14738/tmlai.32.1108.

[28] Wang, Yidi, Zhibin Pan, and Yiwei Pan. "A Training Data Set Cleaning Method by Classification Ability Ranking for the $ k $-Nearest Neighbor Classifier." IEEE transactions on neural networks and learning systems 31.5 (2019): 1544-1556.

[29] Deshmukh, Kamalakshi V., and Sankirti S. Shiravale. "Ambiguity Resolution in English Language for Sentiment Analysis." 2018 IEEE Punecon. IEEE.

[30] Verma, M. Tech Scholar Rajat. "Natural Language Processing (Nlp): A Comprehensive Study." (2018).

[31] Vasiliev, Yuli. Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press, 2020.

[32] Jakkula, Vikramaditya. "Tutorial on support vector machine (svm)." School of EECS, Washington State University 37 (2006)