CLASSIFICATION OF TURKISH TWEETS BY DOCUMENT VECTORS AND INVESTIGATION OF THE EFFECTS OF PARAMETER CHANGES ON CLASSIFICATION SUCCESS

Natural language processing is an artificial intelligence field which is gaining in popularity in recent years. To make an emotional deduction from texts related to an issue, or classify documents are of great importance considering the increasing data size in today's world. Understanding and interpreting written texts is a feature that pertains to people. But, it is possible to deduce from texts or classify texts using natural language processing which is a sub-branch of machine learning and artificial intelligence. In this study, both text classification was made on Turkish tweets, and text classification success of method parameter changes was investigated using two different methods of the algorithm mentioned as document vectors in the literature. It was found in the study that as well as higher accuracy values were obtained by the DBoW (Distributed Bag of Words) method than DM (Distributed Memory) method; higher accuracy values were also obtained by DBoW-NS (Negative Sampling) architecture than others.

___

  • [1] Chowdhury. G.G., “Natural language processing”, Annual review of information science and technology,37,1, 51-89, 2005.
  • [2] Maron, M.E., “Automatic indexing: an experimental inquiry”, Journal of the ACM,8,3,404-417,1961.
  • [3] Fabrizio, S., “Machine learning in automated text categorization”, ACM computing surveys,34,1,1-47,2001.
  • [4] Dalal, M.K., Mukesh, A. Z., “Automatic Text Classification: A Technical Review”, International Journal of Computer Applications, 28,2,37- 40, 2011.
  • [5] Sommer, S.,Schieber, A., Hilbert, A., Heinrich K., “Analyzing customer sentiments in microblogs–A topic-model-based approach for Twitter datasets”, Americas conference on information systems (AMCIS),Detroit, Michigan, USA, (2011), 1-7.
  • [6] Liu, B, Lei, Z., “Mining Text Data: A survey of opinion mining and sentiment analysis”, Mining Text Data, Springer, Boston, USA, 2012, 415-463.
  • [7] Prabowo, R., Thelwall, M., “Sentiment analysis: A combined approach”, Journal of Informetrics, 3,2, 143-157, 2009.
  • [8] Zhang, D., Xu H., Su, Z., Xu, Y., “Chinese comments sentiment classification based on word2vec and SVMperf”, Expert Systems with Applications, 42,4,1857-1863,2014.
  • [9] Dickinson, B., Wei, H., “Sentiment analysis of investor opinions on twitter”, Social Networking, 4,3, 62-71,2015.
  • [10] Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B., “Learning sentiment-specific word embedding for twitter sentiment classification”, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, (2014), 1555-1565.
  • [11] Polpinij. J., Natthakit, S., Paphonput, S., “Word2Vec Approach for Sentiment Classification Relating to Hotel Reviews”, 13th. International Conference on Computing and Information Technology, Bangkok, Thailand, (2017), 308-316.
  • [12] Şahin, G., “Turkish document classification based on Word2Vec and SVM classifier”, Signal Processing and Communications Applications Conference, Antalya, Turkey, (2017), 1-4.
  • [13] Xue, B., Chen, F. and Zhan, S., “A study on sentiment computing and classification of sina weibo with word2vec”, IEEE International Congress on Big Data, Alaska, USA, (2014), 358-363.
  • [14] Bilgin, M. and Şentürk, İ. F., “Sentiment analysis on Twitter data with semi-supervised Doc2Vec”, 2nd. International Conference on Computer Science and Engineering, Antalya, Turkey, (2017), 661-666.
  • [15] Bilgin, M. and Köktaş, H., “Word2Vec Based Sentiment Analysis for Turkish Texts”, International Conference on Engineering Technologies, Konya, Turkey, (2017), 106-109.
  • [16] Bilgin, M. and Köktaş, H., “Sentiment Analysis with Term Weighting and Word Vectors”, International Arab Journal of Information Technology, 16,5, 953-959, 2019.
  • [17] Ayata, D., Saraçlar, M. and Özgür, A., “Turkish tweet sentiment analysis with word embedding and machine learning”, 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, (2017), 1-4.
  • [18] Yüksel, A.E., Türkmen, Y.A., Özgür, A. and Altınel, B., “Turkish Tweet Classification with Transformer Encoder, International Conference on Recent Advances in Natural Language Processing”, Varna, Bulgaria, (2019), 1380-387.
  • [19] Akkol, E., Alıcı, S., Aydın, C. and Tarhan, Ç., “What Happened in Turkey After Booking.com Limitation: Sentiment Analysis of Tweets via Text Mining”, Economic and Financial Challenges for Balkan and Eastern European Countries, 291-301, 2020.
  • [20] Akkol, E., Alıcı, S., Aydın, C. and Tarhan, Ç., “Sentiment Analysis of How Turkish Customers Affected by PayPal Closure”, Economic and Financial Challenges for Balkan and Eastern European Countries, 303-313, 2020.
  • [21] Mikolov. T., Chen, K., Corrado, G., Dean, J., “Efficient estimation of word representations in vector space”, International Conference on Learning Representations, Scottsdale, Arizona, USA, (2013), 1-12.
  • [22] Campr, M., Karel, J., “Comparing semantic models for evaluating automatic document summarization”, International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, (2015), 252-260.
  • [23] Kamkarhaghighi. M., Masoud, M., “Content Tree Word Embedding for document representation”, Expert Systems with Applications, 90,241-249,2017.
  • [24] Le. Q., Mikolov, T., “Distributed representations of sentences and documents”, International Conference on Machine Learning, Beijing, China, (2014),1188-1196.
  • [25] Amasyalı, M.F., Taşköprü, H., Çalışkan, K. (2019) Duygu durum Analizinde Kelimeler, Anlamlar, Karakterler [Internet] Yıldız Technical University. Available from:http://www.kemik.yildiz.edu.tr/data/File/17bintweet.zip [accessed November 10, 2019].