The impact of text preprocessing on the prediction of review ratings

The impact of text preprocessing on the prediction of review ratings

With the increase of e-commerce platforms and online applications, businessmen are looking to have a ratingand review system through which they can easily reveal the feelings of customers related to their products and services. Itis undeniable from the statistics that online ratings and reviews attract new customers as well as increase sales by meansof providing confidence, ratification, opinions, comparisons, merchant credibility, etc. Although considerable researchhas been devoted to the sentiment analysis for review classification, rather less attention has been paid to the textpreprocessing which is a crucial step in opinion mining especially if convenient preprocessing strategies are found out toincrease the classification accuracy. In this paper, we concentrate on the impact of simple text preprocessing decisions inorder to predict fine-grained review rating stars whereas the majority of previous work focused on the binary distinctionof positive vs. negative. Therefore, the aim of this research is to analyze preprocessing techniques and their influence,at the same time explain the interesting observations and results on the performance of a five-class–based review ratingclassifier.

___

  • [1] Holleschovsky N, Constantinides E. Impact of online product reviews on purchasing decisions. In: Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016); Rome, Italy; 2016. pp. 271-278.
  • [2] Sharma P, Agrawal A, Alai L, Garg A. Challenges and techniques in preprocessing for twitter data. International Journal of Engineering Science and Computing 2017; 7 (4): 6611-6613.
  • [3] Ghag KV, Shah K. Comparative analysis of effect of stopwords removal on sentiment classification. In: IEEE International Conference on Computer, Communication and Control; Indore, India; 2015. pp. 1-6.
  • [4] Jianqiang Z, Xiaolin G. Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Open Access Journal 2017; 5 (1): 2870-2879. doi: 10.1109/ACCESS.2017.2672677
  • [5] Srividhya V, Anitha R. Evaluating preprocessing techniques in text categorization. International Journal of Computer Science and Application Issue 2010; 47 (11): 49-51.
  • [6] Camacho-Collados J, Pilehvar MT. On the role of text preprocessing in neural network architectures. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Brussels, Belgium; 2018. pp. 40–46.
  • [7] Ghag K, Shah K. Optimising sentiment classification using preprocessing techniques. International Journal of IT & Knowledge Management 2015; 8 (2) : 61-70.
  • [8] Jianqiang Z. Pre-processing boosting twitter sentiment analysis? In: IEEE International Conference on Smart City/SocialCom/SustainCom 2015; Chengdu, China; 2015. pp. 748-753.
  • [9] Safeek I, Kalideen MR. Preprocessing on facebook data for sentiment analysis. In: Proceedings of 7th International Symposium on Multidisciplinary Research for Sustainable Development; Oluvil, Sri Lanka; 2015. pp. 69-78.
  • [10] Singh T, Kumari M. The role of text pre-processing in sentiment analysis. In: Twelfth International MultiConference on Information Processing (IMCIP-2016); Procedia Computer Science; Nice, France; 2016. pp. 549-554.
  • [11] Krouska A, Troussas C, Virvou M. The effect of preprocessing techniques on twitter sentiment analysis. In: 7th International Conference on Information, Intelligence, Systems & Applications; Chalkidiki, Greece; 2016. pp. 740- 752.
  • [12] Zin H, Mustapha N, Murad M, Sharef N. The Effects of pre-processing strategies in sentiment analysis of online movie reviews. In: The 2nd International Conference on Applied Science and Technology; Kedah, Malaysia ; 2017. pp. 4575–4587.
  • [13] Pomikalek J, Rehurek R. The influence of preprocessing parameters on text categorization. International Journal of Applied Science Engineering and Technology 2007; 1 (9): 54-57.
  • [14] Schofield A, Magnusson M, Thompson L, Mimno D. Understanding text pre-processing for latent dirichlet allocation. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; Valencia, Spain; 2017. pp. 432-436.
  • [15] Fan M, Khademi M. Predicting a Business’ Star in Yelp from Its Reviews’ Text Alone. arXiv 2014; arXiv:1401.0864.
  • [16] Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for arabic text. Journal of Information Science 2014; 40 (4) :501-513. doi: 10.1177/0165551514534143
  • [17] Saad M. The impact of text preprocessing and term weighting on arabic text classification. MSc, Computer Engineering Department, The Islamic University, Gaza, 2010.
  • [18] Uysal A, Gunal S. The impact of preprocessing on text classification. Information Processing and Management 2014; 50 (1): 104-112. doi: 10.1016/j.ipm.2013.08.006
  • [19] Shiha M, Ayvaz S. The effects of emoji in sentiment analysis. International Journal of Computer Electrical Engineering 2017; 9 (1): 360-369. doi: 10.17706/IJCEE.2017.9.1.360-369
  • [20] Wegrzyn-Wolska K, Bougueroua L, Yu H, Zhong J. Explore the effects of emoticons on twitter. Computer Science and Information Technology 2016; 6 (1) : 65-77. doi: 10.5121/csit.2016.61006
  • [21] Park Y, Byrd R. Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing; Intelligent Information System Institute; Pittsburgh; 2001. pp. 126-133.
  • [22] Kaur A, Singh P, Rani S. Spell checking and error correcting system for text paragraphs written in punjabi language using hybrid approach. International Journal Of Engineering And Computer Science 2014; 3(9): 8030-8032.
  • [23] Bertoldi N, Cettolo M, Federico M. Statistical Machine translation of texts with misspelled words. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics; Los Angeles, California, USA; 2010. pp. 412-419.
  • [24] Müller P, Ohnheiser I, Olsen S, Reiner F. Multi-word expressions. An International Handbook of the Languages of Europe, Berlin, Germany: HSK series, 2011.
  • [25] Constant M, Sigogne A, Watrin P. Discriminative Strategies to integrate multiword expression recognition and parsing. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics; Jeju Island, Korea; 2012. pp 204-212.
  • [26] Schonlau M, Guenther N, Sucholutsky I. Text mining with ngram variables. The Stata Journal 2017; 17 (4): 866-881. doi: 10.1177/1536867X1801700406
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Deep neural network based m-learning model for predicting mobile learners’ performance

Shafaq MUSSADIQ, Asad HABIB, Jawad ASHRAF, Muhammad ADNAN, Arsalan ALI RAZA

An alternative method of biomedical signal transmission through the GSM voice channel

Seyedmohsen DEHGHANOJAMAHALLEH, Mehmet KAYA

An arbitrary waveform magnetic nanoparticle relaxometer with an asymmetrical three-section gradiometric receive coil

Can Barış TOP

Analysis of acoustic sensor placement for PD location in power transformer

Baharuddin ISMAIL, Chai Chang YII, Muhammad Nur Khairul Hafizi ROHANI, Wan Nurul Auni Wan MUHAMMAD, Khairul Nadiah KHALID, Muzamir ISA

Emulation of burst-based adaptive link rates in NetFPGA towards green networking

Mirnalinee THANGANADAR THANGATHAI, Shahul Hamead HAJA MOINUDEEN, Kavi Priya DHANDAPANI

Investigating the efficiency of multithreading application programming interfaces for parallel packet classification in wireless sensor networks

Mahdi ABBASI, Mohammad R. KHOSRAVI, Milad RAFIEE

A high-level and adaptive metaheuristic selection algorithm for solving high dimensional bound-constrained continuous optimization problems

Aybars UĞUR, Osman GÖKALP

A template-based code generator for web applications

Burak UYANIK, Veysel Harun ŞAHİN

An optimized FPGA design of inverse quantization and transform for HEVC decoding blocks and validation in an SW/HW environment

Rabie BEN ATITALLAH, Manel KAMMOUN, Ahmed BEN ATITALLAH

Comparative analysis of classification techniques for network fault management

Adeeb SAAIDAH, Fidaa JARGHON, Yousef FAZEA, Omar ALMOMANI, Mohammed MADI