RapidMiner ile Twitter Verilerinin Konu Modellemesi

Bu çalışmada öncelikle RapidMiner kullanılarak Twitter’da belirli kelimeleriiçeren tweet verileri elde edildi, bu veriler ön işlemden geçirildi ve sonrasındatweetlerin konu modellemesi yapıldı. Ön işleme için “Search Twitter”, “SelectAttributes”, “Nominal to Text” blokları kullanıldı. Ön işlemden geçen Twitterverileri “Tokenize”, “Aggregate” ve “Discretize” operatörleri kullanılarak analizedildi. Tweetlerde en çok kullanılan kelimeler belirlendi ve kullanım sıklığınagöre kelime grupları oluşturuldu. Daha sonra Twitter verilerine nasıl konu bazlıkümeleme yapılacağı anlatıldı. Bu işlem için Latent Dirichlet Allocationmodelini kullanan “Extract Topics From Documents (LDA)” operatörükullanıldı. Tweetlerde en fazla kullanılan kelimeler ve kullanıcı başına atılantweet sayıları, grafik ve tablolarla incelendi, ayrıca konu modellemesi sonucundaelde edilen konuların kelime bulutu oluşturuldu.

Topic Modeling of Twitter Data via RapidMiner

In this study, firstly, tweets containing specific words on the Twitter platform were obtained and pre-processed using the RapidMiner software. After that, the tweets are clustered based on the topic modeling approach. “Search Twitter”, “Select Attributes”, and “Nominal to Text” blocks were used for preprocessing. This preprocessed data is then analyzed using “Tokenize”, “Aggregate”, and “Discretize” operators. The most used words were determined, and tweets are grouped according to their frequencies. Then, it is explained how to perform topic-based modeling and clustering on Twitter data. “Extract Topics From Documents (LDA)” operator, which uses the Latent Dirichlet Allocation model, was used for this process. The most commonly used words in tweets, and the number of tweets per user were extracted and investigated via tables and graphical illustrations. In addition, the word cloud of each topic, obtained as a result of the topic modeling process, was created.

___

  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
  • Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A. and Menczer, F. (2011, October). Predicting the Political Alignment of Twitter Users. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 192-199). IEEE.
  • Corley, C., Cook, D., Mikler, A. and Singh, K. (2010). Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
  • Culotta, A. (2010, July). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In Proceedings of the First Workshop on Social Media Analytics (pp. 115-122). Acm.
  • Earle, P. S., Bowden, D. C. and Guy, M. (2012). Twitter Earthquake Detection: Earthquake Monitoring in a Social World. Annals of Geophysics, 54(6).
  • Jain, A. P. and Katkar, V. D. (2015). Sentiments Analysis of Twitter Data Using Data Mining. In 2015 International Conference on Information Processing (ICIP) (pp. 807-810). IEEE.
  • Jiang, K. and Zheng, Y. (2013, December). Mining Twitter Data for Potential Drug Effects. In International Conference on Advanced Data Mining And Applications (pp. 434-443). Springer, Berlin, Heidelberg.
  • Lamba, M. and Madhusudhan, M. (2018). Application of Topic Mining and Prediction Modeling Tools for Library and Information Science Journals. Library Practices in Digital Era. Eds. MR Murali Prasad et al. Hyderabad: BS Publications, 395-401.
  • LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
  • Majid, A., Chen, L., Chen, G., Mirza, H. T., Hussain, I. and Woodward, J. (2013). A Context-Aware Personalized Travel Recommendation System Based on Geotagged Social Media Data Mining. International Journal of Geographical Information Science, 27(4), 662-684.
  • Mitchell, T. M. (1999). Machine Learning and Data Mining. Communications of the ACM, 42(11).
  • Tong, Z. and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. International Conference on Computer Science, Engineering and Information Technology (pp. 201-210).