The impact of text preprocessing on the prediction of review ratings

With the increase of e-commerce platforms and online applications, businessmen are looking to have a rating and review system through which they can easily reveal the feelings of customers related to their products and services. It is undeniable from the statistics that online ratings and reviews attract new customers as well as increase sales by means of providing confidence, ratification, opinions, comparisons, merchant credibility, etc. Although considerable research has been devoted to the sentiment analysis for review classification, rather less attention has been paid to the text preprocessing which is a crucial step in opinion mining especially if convenient preprocessing strategies are found out to increase the classification accuracy. In this paper, we concentrate on the impact of simple text preprocessing decisions in order to predict fine-grained review rating stars whereas the majority of previous work focused on the binary distinction of positive vs. negative. Therefore, the aim of this research is to analyze preprocessing techniques and their influence, at the same time explain the interesting observations and results on the performance of a five-class-based review rating classifier.

___

  • [1] Holleschovsky N, Constantinides E. Impact of online product reviews on purchasing decisions. In: Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016); Rome, Italy; 2016. pp. 271-278.
  • [2] Sharma P, Agrawal A, Alai L, Garg A. Challenges and techniques in preprocessing for twitter data. International Journal of Engineering Science and Computing 2017; 7 (4): 6611-6613.
  • [3] Ghag KV, Shah K. Comparative analysis of effect of stopwords removal on sentiment classification. In: IEEE International Conference on Computer, Communication and Control; Indore, India; 2015. pp. 1-6.
  • [4] Jianqiang Z, Xiaolin G. Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Open Access Journal 2017; 5 (1): 2870-2879. doi: 10.1109/ACCESS.2017.2672677
  • [5] Srividhya V, Anitha R. Evaluating preprocessing techniques in text categorization. International Journal of Computer Science and Application Issue 2010; 47 (11): 49-51.
  • [6] Camacho-Collados J, Pilehvar MT. On the role of text preprocessing in neural network architectures. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Brussels, Belgium; 2018. pp. 40–46.
  • [7] Ghag K, Shah K. Optimising sentiment classification using preprocessing techniques. International Journal of IT & Knowledge Management 2015; 8 (2) : 61-70.
  • [8] Jianqiang Z. Pre-processing boosting twitter sentiment analysis? In: IEEE International Conference on Smart City/SocialCom/SustainCom 2015; Chengdu, China; 2015. pp. 748-753.
  • [9] Safeek I, Kalideen MR. Preprocessing on facebook data for sentiment analysis. In: Proceedings of 7th International Symposium on Multidisciplinary Research for Sustainable Development; Oluvil, Sri Lanka; 2015. pp. 69-78.
  • [10] Singh T, Kumari M. The role of text pre-processing in sentiment analysis. In: Twelfth International MultiConference on Information Processing (IMCIP-2016); Procedia Computer Science; Nice, France; 2016. pp. 549-554.
  • [11] Krouska A, Troussas C, Virvou M. The effect of preprocessing techniques on twitter sentiment analysis. In: 7th International Conference on Information, Intelligence, Systems & Applications; Chalkidiki, Greece; 2016. pp. 740- 752.
  • [12] Zin H, Mustapha N, Murad M, Sharef N. The Effects of pre-processing strategies in sentiment analysis of online movie reviews. In: The 2nd International Conference on Applied Science and Technology; Kedah, Malaysia ; 2017. pp. 4575–4587.
  • [13] Pomikalek J, Rehurek R. The influence of preprocessing parameters on text categorization. International Journal of Applied Science Engineering and Technology 2007; 1 (9): 54-57.
  • [14] Schofield A, Magnusson M, Thompson L, Mimno D. Understanding text pre-processing for latent dirichlet allocation. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; Valencia, Spain; 2017. pp. 432-436.
  • [15] Fan M, Khademi M. Predicting a Business’ Star in Yelp from Its Reviews’ Text Alone. arXiv 2014; arXiv:1401.0864.
  • [16] Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for arabic text. Journal of Information Science 2014; 40 (4) :501-513. doi: 10.1177/0165551514534143
  • [17] Saad M. The impact of text preprocessing and term weighting on arabic text classification. MSc, Computer Engineering Department, The Islamic University, Gaza, 2010.
  • [18] Uysal A, Gunal S. The impact of preprocessing on text classification. Information Processing and Management 2014; 50 (1): 104-112. doi: 10.1016/j.ipm.2013.08.006
  • [19] Shiha M, Ayvaz S. The effects of emoji in sentiment analysis. International Journal of Computer Electrical Engineering 2017; 9 (1): 360-369. doi: 10.17706/IJCEE.2017.9.1.360-369
  • [20] Wegrzyn-Wolska K, Bougueroua L, Yu H, Zhong J. Explore the effects of emoticons on twitter. Computer Science and Information Technology 2016; 6 (1) : 65-77. doi: 10.5121/csit.2016.61006
  • [21] Park Y, Byrd R. Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing; Intelligent Information System Institute; Pittsburgh; 2001. pp. 126-133.
  • [22] Kaur A, Singh P, Rani S. Spell checking and error correcting system for text paragraphs written in punjabi language using hybrid approach. International Journal Of Engineering And Computer Science 2014; 3(9): 8030-8032.