A New Technique for Sentiment Analysis System Based on Deep Learning Using Chi-Square Feature Selection Methods

A New Technique for Sentiment Analysis System Based on Deep Learning Using Chi-Square Feature Selection Methods

The sentiment analysis system uses natural language processing techniques and a sentimental vocabulary network. Sentiment analysis means discovering and recognizing people's positive or negative feelings about an issue or product in the texts. Increasing the importance of sentiment analysis has coincided with social media's growth, such as opinion polls, weblogs, Twitter and other social networks. One of the applications of deep learning in NLP is sentiment analysis. The most common and successful type of RNN is the LSTM network. There is a lot of research that uses the LSTM ability to analyze sentiment. But large data volumes reduce the accuracy of LSTM network results in test data; in other words, the problem of over-fitting occurs. This problem occurs when there is a high correlation between independent variables. The model may not have high validity despite the high value of the correlation coefficient between the independent and dependent variables. In other words, although the model looks good, it does not have significant independent variables. Combining the LSTM network with feature selection methods can increase sentiment analysis accuracy to select effective features and solve this problem. In this study, we review state of the art to determine how previous research has addressed these tasks. We also proposed combining the feature selection method, Chi-Square with LSTM, Bi-LSTM and GRU models, the performance of each measured and compared in terms of accuracy, precision, recall, and F1 score for two benchmark datasets, YELP and US Airline. The results show that feature selection methods significantly increases classification accuracy in all cases. In the Yelp dataset, the maximum attained an accuracy of Bi-LSTM is 100% using chi-square when the number of features is 500 In the US Airline dataset, the maximum achieved an accuracy of GRU-LSTM is 97.9% using chi-square when the number of features is 20.

___

  • [1] S. Xu, H. Liang and T. Baldwin, "Unimelb at semeval-2016 tasks 4a and 4b: An ensemble of neural networks and a word2vec based model for sentiment classification," in Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 2016.
  • [2] Hameed, Z., & Garcia-Zapirain, B. (2020). Sentiment classification using a single-layered BiLSTM model. Ieee Access, 8, 73992-74001. H. Binali, V. Potdar and C. Wu, "A state of the art opinion mining and its application domains," in 2009 IEEE International Conference on Industrial Technology, 2009.
  • [3] F. Chollet, Deep Learning with Python, Manning Publications Co., 2018.
  • [4] Rustam, F., Ashraf, I., Mehmood, A., Ullah, S., & Choi, G. S. (2019). Tweets classification on the base of sentiments for US airline companies. Entropy, 21(11), 1078.
  • [5] Cheng, Y., Yao, L., Xiang, G., Zhang, G., Tang, T., & Zhong, L. (2020). Text sentiment orientation analysis based on multi-channel CNN and bidirectional GRU with attention mechanism. IEEE Access, 8, 134964-134975.
  • [6] Wazery, Y. M., Mohammed, H. S., & Houssein, E. H. (2018, December). Twitter sentiment analysis using deep neural network. In 2018 14th International Computer Engineering Conference (ICENCO) (pp. 177-182). IEEE.
  • [7] Rane, A., & Kumar, A. (2018, July). Sentiment classification system of Twitter data for US airline service analysis. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (Vol. 1, pp. 769-773). IEEE.
  • [8] A. Kumar, A. Jaiswal ,Particle Swarm Optimized Ensemble Learning for Enhanced Predictive Sentiment Accuracy of Tweets. In: Singh P., Panigrahi B., Suryadevara N., Sharma S., Singh A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. (2020) , Springer, Cham. https://doi.org/11007/978-3-030-30577-2_56.
  • [9] Zainuddin, N., Selamat, A., & Ibrahim, R. (2016, August). Twitter feature selection and classification using support vector machine for aspect-based sentiment analysis. In International conference on industrial, engineering and other applications of applied intelligent systems (pp. 269-279). Springer, Cham.
  • [10] Paudel, S., Prasad, P. W. C., Alsadoon, A., Islam, M. R., & Elchouemi, A. (2018, July). Feature selection approach for Twitter sentiment analysis and text classification based on Chi-Square and Naïve Bayes. In International Conference on Applications and Techniques in Cyber Security and Intelligence (pp. 281-298). Springer, Cham.
  • [11] Hameed, Z., & Garcia-Zapirain, B. (2020). Sentiment classification using a single-layered BiLSTM model. Ieee Access, 8, 73992-74001.
  • [12] Abdalla, G., & Özyurt, F. (2020). Sentiment Analysis of Fast Food Companies With Deep Learning Models. The Computer Journal.
  • [13] Abdulkhaliq, S. S., & Darwesh, A. M. (2020). Sentiment Analysis Using Hybrid Feature Selection Techniques. UHD Journal of Science and Technology, 4(1), 29-40.
  • [14] Nguyen, H. T., & Le Nguyen, M. (2019). An ensemble method with sentiment features and clustering support. Neurocomputing, 370, 155-165. Dong, Y., Fu, Y., Wang, L., Chen, Y., Dong, Y., & Li, J. (2020).
  • [15] A sentiment analysis method of capsule network based on BiLSTM. IEEE Access, 8, 37014-37020.
  • [16] Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.
  • [17] Eight, Figure. “Twitter US Airline Sentiment.” Kaggle, 16 Oct. 2019, www.kaggle.com/crowdflower/twitter-airline-sentiment.
  • [18] Mohit. Yelp-dataset.” Kaggle, 22 March. 2018, https://www.kaggle.com/mohit473/yelp-data-set.