Russia-Ukraine Conflict: A Text Mining Approach through Twitter

Russia-Ukraine Conflict: A Text Mining Approach through Twitter

The focus of this study is to use social media to investigate the Russia-Ukraine conflict. With the assent of the Russian parliament, Russian President Vladimir Putin proclaimed that they will begin invading Ukraine on February 24, 2022. During the Russia-Ukraine conflict, social media, particularly Twitter, has been heavily used. For that reason, it becomes to strong tool for handling processes during the conflict such as political decision making, organizing humanitarian activities, and proving assistance for victims. As a result, social media becomes the most up-to-date, comprehensive, and large information source for current scenario analysis. A total of 65412 tweets are gathered as a dataset for analysis in the proposed study between February 24 and April 5. Then, for each tweet, a topic modeling method called Latent Dirichlet Allocation (LDA) is used to collect significant topics and their probabilities considering each tweets. Then, using the specified probabilities, Fuzzy c-means is utilized to generate clusters for the entire document. Finally, seven unique clusters have been gathered for processing. N-grams and network analysis are used to examine each resulting cluster for a better understanding. As a result of this study, worldwide public opinion, current situation of civilians, course of the conflict, humanitarian issues during the Russia-Ukraine conflict are extracted.

___

  • [1] F. Bordignon, I. Diamanti, and F. Turato, "Rally'round the Ukrainian flag. The Russian attack and the (temporary?) suspension of geopolitical polarization in Italy," Contemporary Italian Politics, pp. 1-17, 2022.
  • [2] L. Eras, "War, Identity Politics, and Attitudes toward a Linguistic Minority: Prejudice against Russian-Speaking Ukrainians in Ukraine between 1995 and 2018," Nationalities Papers, pp. 1-22, 2022.
  • [3] N. A. Ghani, S. Hamid, I. A. Targio Hashem, and E. Ahmed, "Social media big data analytics: A survey," Computers in Human Behavior, vol. 101, pp. 417-428, 2019/12/01/ 2019.
  • [4] A. Gandomi and M. Haider, "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, vol. 35, no. 2, pp. 137-144, 2015/04/01/ 2015.
  • [5] D. M. Blei and J. D. Lafferty, "Topic models," in Text mining: Chapman and Hall/CRC, 2009, pp. 101-124.
  • [6] S. A. Curiskis, B. Drake, T. R. Osborn, and P. J. Kennedy, "An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit," Information Processing and Management, Article vol. 57, no. 2, 2020, Art no. 102034, doi: 10.1016/j.ipm.2019.04.002.
  • [7] M. E. Roberts et al., "Structural topic models for open-ended survey responses," American Journal of Political Science, Article vol. 58, no. 4, pp. 1064-1082, 2014, doi: 10.1111/ajps.12103.
  • [8] H. Yuan, R. Y. K. Lau, and W. Xu, "The determinants of crowdfunding success: A semantic text analytics approach," Decision Support Systems, Article vol. 91, pp. 67-76, 2016, doi: 10.1016/j.dss.2016.08.001.
  • [9] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
  • [10] Y. Yang, J. H. Hsu, K. Löfgren, and W. Cho, "Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining," Social Network Analysis and Mining, Article vol. 11, no. 1, 2021, Art no. 75, doi: 10.1007/s13278-021-00772-w.
  • [11] P. Hu, W. Liu, W. Jiang, and Z. Yang, "Latent topic model for audio retrieval," Pattern Recognition, Conference Paper vol. 47, no. 3, pp. 1138-1143, 2014, doi: 10.1016/j.patcog.2013.06.010.
  • [12] L. Yao et al., "Concept over time: The combination of probabilistic topic model with wikipedia knowledge," Expert Systems with Applications, Article vol. 60, pp. 27-38, 2016, doi: 10.1016/j.eswa.2016.04.014.
  • [13] M. Pavlinek and V. Podgorelec, "Text classification method based on self-training and LDA topic models," Expert Systems with Applications, Article vol. 80, pp. 83-93, 2017, doi: 10.1016/j.eswa.2017.03.020.
  • [14] S. Xiong, K. Wang, D. Ji, and B. Wang, "A short text sentiment-topic model for product reviews," Neurocomputing, Article vol. 297, pp. 94-102, 2018, doi: 10.1016/j.neucom.2018.02.034.
  • [15] L. Hong and B. D. Davison, "Empirical study of topic modeling in twitter," in Proceedings of the first workshop on social media analytics, 2010, pp. 80-88.
  • [16] E. Lee, F. Rustam, I. Ashraf, P. B. Washington, M. Narra, and R. Shafique, "Inquest of Current Situation in Afghanistan Under Taliban Rule Using Sentiment Analysis and Volume Analysis," IEEE Access, Article vol. 10, pp. 10333-10348, 2022, doi: 10.1109/ACCESS.2022.3144659.
  • [17] P. Vazquez, J. C. Garcia, M. J. Luna, and C. Vaca, "Temporal topics in online news articles: Migration crisis in Venezuela," in 2020 7th International Conference on eDemocracy and eGovernment, ICEDEG 2020, 2020, pp. 106-113, doi: 10.1109/ICEDEG48599.2020.9096804.
  • [18] L. Tang, Y. Zhang, F. Dai, Y. Yoon, Y. Song, and R. S. Sharma, "Social Media Data Analytics for the U.S. Construction Industry: Preliminary Study on Twitter," Journal of Management in Engineering, Article vol. 33, no. 6, 2017, Art no. 04017038, doi: 10.1061/(ASCE)ME.1943-5479.0000554.
  • [19] C. S. Lee and A. Jang, "Questing for Justice on Twitter: Topic Modeling of #StopAsianHate Discourses in the Wake of Atlanta Shooting," Crime and Delinquency, Article 2021, doi: 10.1177/00111287211057855.
  • [20] J. Allan, Topic detection and tracking: event-based information organization. Springer Science & Business Media, 2012.
  • [21] J. C. Bezdek, R. Ehrlich, and W. Full, "FCM: The fuzzy c-means clustering algorithm," Computers & Geosciences, vol. 10, no. 2, pp. 191-203, 1984/01/01/ 1984.
  • [22] S. Abri and R. Abri, "Providing a Personalization Model Based on Fuzzy Topic Modeling," Arabian Journal for Science and Engineering, Article vol. 46, no. 4, pp. 3079-3086, 2021, doi: 10.1007/s13369-020-05048-7.
  • [23] H. Alatas, H. Murfi, and A. Bustamam, "Topic Detection using fuzzy c-means with nonnegative double singular value decomposition initialization," International Journal of Advances in Soft Computing and its Applications, Article vol. 10, no. 2, pp. 206-222, 2018.
  • [24] Y. Prakoso, H. Murfi, and A. Wibowo, "Kernelized Eigenspace based fuzzy C-means for sensing trending topics on twitter," in ACM International Conference Proceeding Series, 2018, pp. 6-10, doi: 10.1145/3239283.3239297.
  • [25] A. Parlina, K. Ramli, and H. Murfi, "Exposing emerging trends in smart sustainable city research using deep autoencoders-based fuzzy c-means," Sustainability (Switzerland), Article vol. 13, no. 5, pp. 1-28, 2021, Art no. 2876, doi: 10.3390/su13052876.
  • [26] R. T. Sutrisman and H. Murfi, "Analysis of non-negative double singular value decomposition initialization method on eigenspace-based fuzzy C-Means algorithm for Indonesian online news topic detection," in 2018 6th International Conference on Information and Communication Technology, ICoICT 2018, 2018, pp. 55-60, doi: 10.1109/ICoICT.2018.8528791.
  • [27] M. Trupthi, S. Pabboju, and G. Narsimha, "Possibilistic fuzzy C-means topic modelling for twitter sentiment analysis," International Journal of Intelligent Engineering and Systems, Article vol. 11, no. 3, pp. 100-108, 2018, doi: 10.22266/IJIES2018.0630.11.
  • [28] T. Mandhula, S. Pabboju, and N. Gugulotu, "Predicting the customer’s opinion on amazon products using selective memory architecture-based convolutional neural network," Journal of Supercomputing, Article vol. 76, no. 8, pp. 5923-5947, 2020, doi: 10.1007/s11227-019-03081-4.
  • [29] L. Kolhe, A. K. Jetawat, and V. Khairnar, "Robust product recommendation system using modified grey wolf optimizer and quantum inspired possibilistic fuzzy C-means," Cluster Computing, Article vol. 24, no. 2, pp. 953-968, 2021, doi: 10.1007/s10586-020-03171-6.
  • [30] D. Khyani, B. Siddhartha, N. Niveditha, and B. Divya, "An Interpretation of Lemmatization and Stemming in Natural Language Processing," Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology, vol. 22, pp. 350-357, 2020.
  • [31] N. Eligüzel, C. Çetinkaya, and T. Dereli, "Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: A case study," Advanced Engineering Informatics, vol. 46, p. 101151, 2020.
  • [32] K. Crockett, D. Mclean, A. Latham, and N. Alnajran, "Cluster analysis of twitter data: A review of algorithms," in Proceedings of the 9th International Conference on Agents and Artificial Intelligence, 2017, vol. 2: Science and Technology Publications (SCITEPRESS)/Springer Books, pp. 239-249.
  • [33] J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters," Journal of Cybernetics, vol. 3, no. 3, pp. 32-57, 1973/01/01 1973, doi: 10.1080/01969727308546046.
  • [34] M. Rawashdeh and A. L. Ralescu, "Fuzzy Cluster Validity with Generalized Silhouettes," in MAICS, 2012.
  • [35] D. Jurafsky and J. H. Martin, "N-gram Language Models," in Speech and Language Processing, 3rd ed. draft ed., 2021.