Sosyal Ağlarda Topluluk ve Konu Tespiti: Bir Sistematik Literatür Taraması

Günümüzde internetin hızlı bir şekilde gelişmesi ve kolay bir şekilde ulaşılır olması; Facebook, Instagram, Twitter ve LinkedIN gibi yaygın kullanılan sosyal iletişim platformlarını büyük veri yığınlarının olduğu ortamlara dönüştürmüştür. Bu durum hem aranan bilgiye kolay bir şekilde ulaşılabilmesi için konu tespiti uygulamalarının, hem de konuyla ilgili paylaşım yapan benzer eğilim ve düşünceye sahip topluluklara toplu hizmet verebilmek için topluluk tespit uygulamalarının bu platformlarda kullanımını zorunlu hale getirmiştir. Bu yüzden araştırmacıların sosyal iletişim ağlarında konu tespiti ve topluluk tespiti alanları üzerine araştırmalar yapması ve problemin çözümü ile ilgili yöntem ve teknikler geliştirmesi bu ortamların etkin kullanımı açısından hayati bir önem arz eder. Bu çalışmada, bu alanlara kapsamlı bir bakış sağlamak için sosyal medya platformlarında konu ve topluluk analizi yapan çalışmalar üzerine sistematik ve derinlemesine bir literatür incelemesi sunulmaktadır. İncelemesi yapılacak çalışmaların çoğu uygulamada başarılı sonuçlar ürettiği bilinen makine öğrenmesi temelli modeller kullanan makalelerden seçilmiştir. Bu çalışmaların incelenmesi neticesinde; topluluk tespiti alanında elde ettiği performans değerleri ile Louvain metodunun öne çıktığı görülürken, performans açısından konu analizi alanında tek bir modelin önerilemeyeceği ve uygun modelin ancak verilen sorunun tüm özellikleri göz önünde bulundurularak, probleme özgü şekilde seçilmesi ya da oluşturulması gerektiği sonucuna varılmıştır.

Systematic Literature Review of Detecting Topics and Communities in Social Networks

In the recent past and in today’s world, the internet is advancing rapidly and is easily accessible; this growth has made the social media platforms such as Facebook, Instagram, Twitter, and LinkedIn widely used which produces big data. This requires both topic Detection applications in order to access the required information, as well as community detection practices in order to provide collective services to communities that can be referred to as individuals with similar interests and opinions over the same subject. Therefore, it is vital for researchers to conduct research on topic detection and community detection research areas in social networks and to develop methods and techniques for problem-solving. In this study, a systematic and in-depth literature review is provided on studies that conduct topic and community analysis on social media platforms to provide a comprehensive overview of the given areas. Most of the studies to be analyzed are selected from articles using machine learning-based models that are known to achieve successful results in practice. As a result of the analysis of these studies; it has been concluded that a single model cannot be proposed in the area of topic detection and that the appropriate model should only be selected or created in a problem-specific way, taking into account all the characteristics of the given problem, while the Louvain method seems to stand out with its results in terms of performance in the area of community detection.

PDF

___

[1] Internet: Social Media - Statistics & Facts, https://www.statista.com/topics/1164/social- networks/#dossierKeyfigures, 05 January 2022.
[2] H.-J. Choi and C. H. Park, "Emerging topic detection in twitter stream based on high utility pattern mining", Expert Systems With Applications, 27-36, 2018.
[3] W. Wu, J. Zhao, C. Zhang, F. Meng, Z. Zhang, Y. Zhang & Q. Sun, "Improving performance of tensor-based context-aware reccomenders using bias tensor factorization with context feature auto-encoding", Knowledge Based Systems, 71-77, 2017.
[4] S. Fortunato, "Community detection in graphs", Physics Reports , 1(486), 75-174, 2010.
[5] X. Yao, Y. Zou, Z. Chen, M. Zhao & Q. Liu, "T opic-Based Rank Search with Varifiable Social Data Outscoring", Journal of Parallel and Distributed Computing, 1-12, 2019.
[6] H. Byun, S. Jeong & C.-K. Kim, "SC-Com: Spotting Collusive Community in Opinion Spam Detection", Information Processing & Management, 58(4), 2021.
[7] J. W. Kim, K. M. Lee, M. J. Shaw, H.-L. Change, M. Nelson & R. M. Easley, "A Preference Scoring T echnique for Personalized Advertisements on Internet Storefronts", Mathematical and Computer Modelling, 44(1-2), 3-15, 2006.
[8] H. Liu, Y. Ge, Q. Zheng, R. Lin & H. Li, "Detecting global and local topics via mining twitter data", Neurocomputing, 120-132, 2017.
[9] W. Ai, K. Li & K. Li, "An effective hot topic detection method for microblog on spark", Applied Soft Computing, 1010-1023, 2017.
[10] M. K. Linnenluecke, M. Marrone & A. K. Singh, "Conducting systematic literature reviews and bibliometric analyses", Australian Journal of Management, 45(2), 175-194, 2020.
[11] B. Kitchenham, "Procedures for Performing Systematic Reviews", Computer Science, 2004.
[12] B. Kitchenham & S. Charters, Guidelines for performing systematic literature reviews in software engineering, Keele University, United Kingdom, 2007.
[13] H. G. Gürbüz & B. T ekinerdoğan, "Model-based testing for software safety: a systematic mapping study", Software Quality Journal, 26(4), 1327-1372, 2018.
[14] V. Basili, G. Caldieira & H. Rombach, "Goal Question Metrics Paradigm", Encyclopedia of Software Engineering, 1994.
[15] R. Barcelos & G. T ravasos, "Software Architecture: Identifying the approaches that evaluate its quality", 2004.
[16] J. Biolchini, P. G. Mian, A. C. C. Natali & G. H. T ravassos, Systematic Rewiewin Software Engineering, Systems Engineering and Computer Science Department, Rio de Janeiro, 2005.
[17] Internet: Science Direct, https://www.sciencedirect.com/, 5 January 2022.
[18] T . Dyba, T . Dingsoyr & G. K. Hanssen, "Applying Systematic Reviews to Diverse Study T ypes: An Experience Report", First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), Madrid, Spain, 2007.
[19] Z. Saeed, R. A. Abbasi, I. Razzak, O. Maqbool & A. Sadaf, "Enhanced Hearthbeat Graph for Emerging Event Detection on T witter Using T ime Series Networks", Expert Systems with Applications, 115-132, 2019.
[20] L. M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Göker, I. Kompatsiaris & A. Jaimes, "Sensing T rending T opics in T witter", IEEE T ransactions on Multimedia, 15(6), 1268-1282, 2013.
[21] F. Figueiredo & A. Jorge, "Identifying topic relevant hashtags in T witter streams", Information Sciences, 505, 65-83, 2019.
[22] A. Kumar, T . E. T rueman & A. K. Abinesh, "Suicidal risk identification in social media", 5th International Conference on AI in Computational Linguistics, Bordeaux, France, 2021.
[23] M. Adedoyin-Olowe, M. M. Gaber, C. M. Dancause, F. Stahl & J. B. Gomes, "A rule dynamics approach to event detecion in Twitter with its application to sports and politics", Expert Systems with Applications, 55, 351-360, 2016.
[24] J. Cigarrán, Á. Castellanos & A. García-Serrano, "A step forward for T opic Detection in Twitter: An FCA-based approach", Expert Systems with Applications, 57, 21-36, 2016.
[25] H.-J. Choi & C. H. Park, "Emerging topic detection in twitter stream based on high utility pattern mining", Expert Systems with Applications, 115, 27-36, 2019.
[26] K. Garcia & L. Berton, "Topic detection and sentiment analysis in T witter content related to COVID-19 from Brazil and the USA", Applied Soft Computing Journal, 101, 2020.
[27] T . Edwards, C. B. Jones & P. Corcoran, "Identifying wildlife observations on twitter", Ecological Informatics, 67, 2022.
[28] S. M. Sarsam, H. Al-Sammaraie, A. I. Alzahrani, W. Alnumay & A. P. Smith, "Alexicon-based approach to detecting suicide-related messages on T witter", Biomedical Signal Processing and Control, 65, 2021.
[29] H. G. Yoon, H. Kim, C. O. Kim & M. Song, "Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling", Journal of Informetrics, 10, 634-644, 2016.
[30] M. Garg & M. Kumar, "T WCM: T witter Word Co-occurance Model for Event Detection", 8th International Conference on Advances in Computing and Communication (ICACC-2018), Kochi, India, 2018.
[31] S. Petrovic, M. Osborne & V. Lavrenko, "Using paraphrases for improving first story detection in news and T witter", 2012 Conference of North American Chapter of the Association for Computational Linguistics: Human Language T echnologies , Montreal, Canada, 2012.
[32] G. R, K. S, P. N & P. V, "Tweedle: Sensitivity Check in Health- related Social Short Texts based on Regret Theory", International Conference on Recent T rends in Advanced Computing 2019 (ICRT AC 2019), Chennai, India, 2019.
[33] Ş. Boghiu & D. Gifu, "A Spatial-T emporal Model for Event Detection in Social Media", Procedia Computer Science, 176, 541- 550, 2020.
[34] A. Zamiralov, M. Khodorchenko & D. Nasonov, "Detection of housing and utility problems in districts through social media texts", 9th International Young Scientist Conference on Computational Science (YSC 2020), Crete, Greece, 2020.
[35] M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices", Physical Review E, 3(74), 2006.
[36] M. E. J. Newman, "Modularity and community structure in networks", Proccedings of the National Academy of Sciences of the United States of America, 103(23), 8577-8582, 2006.
[37] I. Inuwa-Dutse, M. Liptrott & I. Korkontzelos, "A multilevel clustering technique for community detection", Neurocomputing, 441, 64-78, 2021.
[38] I. Inuwa-Dutse, M. Liptrott & Y. Korkontzelos, "Analysis and Prediction of Dyads in T witter", International Conference on Applications of Natural Language to Information Systems, Saarbrücken, Germany, 2019.
[39] W. W. Zachary, "An information flow model for conflict and fission in small groups", Journal of Anthropogical Research, 4(33), 452-473, 1977.
[40] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase & S. M. Dawson, "T he bottlenose dolpgin community of Doubtful Sound features a large proportion of long-lasting associations", Behavioral Ecology and Sociobiology, 54, 396-405, 2003.
[41] L. A. Adamic & N. Glance, "T he political blogosphere and the 2004 U.S. election: divided they blog", The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ollinois, United States, 2005.
[42] S. Andreadis, G. Antzoulatos, T. Mavropoulos, P. Giannakeris, G. T zionis, N. Pantelidis, K. Ioannidis, A. Karakostas, I. Gialampoukidis, S. Vrochidis & I. Kopatsiaris, "A social media anlyrics platform visualising the spread of COVID-19 in Italy via explıitation of automatically geotagged tweets", Online Social Networks and Media, 23, 2021.
[43] V. D. Blondel, J.-L. Hoillaume, R. Lambiotte & E. Lefebvre, "Fast unfolding of communitiers in large networks", Journal of Statistical Mechanics: T heory and Experiment, 10, 2008.
[44] T . Hachaj & M. R. Ogiela, "Clustering of trending topics in microblogging posts: A graph-based approach", Future Generation Computer Systems, 67, 297-304, 2017.
[45] M. Jacomy, T. Venturini, S. Heymann & M. Bastien, "ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software", PLOSONE, 6(9), 2014.
[46] M. Alassad, B. Spann & N. Agarwal, "Combining advanced computational social science and graph theoretic techniques to reveal adversarial information operations", Information Processing and Management, 58, 2021.
[47] L. C. Freeman, "A Set of Measures of Centrality Based on Betweenness", Sociometry, 1(40), 35-41, 1977.
[48] S. Al-khateeb & N. Agarwal, "Deviance in Social Media and Social Cyber Forensics: Uncovering Hidden Relations Using Open Source Information (OSINF)", Springer, 2019.
[49] F. Ullah & S. Lee, "Community clustering based on trust modeling weighted by user interests in online social networks", Chaos, Solutions and Fractals", 103, 194-204, 2017.
[50] G. Guo, J. Zhang & N. Yorke-Smith, "A novel bayesian similarity measure for recommender systems", T wenty-third international joint conference on artificial intelligence (IJCAI), Beijing, China, 2013.
[51] G. Guo, J. Zhang, D. T halmann & N. Yorke-Smith, "ETAF: An extended trust antecedents framework for trust prediction", IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), Beijing, China, 2014.
[52] F. Hu, J. Liu, L. Li & J. Liang, "Community detection in complex networks using Node2vec with spectral clustering", Physica A, 545, 2020.
[53] M. Fieler, "Alhebraic connectivity of graphs", Czechoslovak Mathematical Journal, 2(23), 298-305, 1973.
[54] A. Grover & J. Keskovec, "node2vec: Scalable Feature Learning for Networks", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, United States, 2016.
[55] D. E. Knuth, The Stanford GraphBase: a platform for combinatorial computing (Vol. 1.), New York: AcM Press, 1996.
[56] M. Girvan & E. J. Newman, "Community Structure in social and biological networks", Proceedings of the National Academy of Sciences, 12(99), 7821-7826, 2002.
[57] P. Gleiser & L. Danon, "Community structure in jaxx", Complex Systems, 4(6), 5656-573, 2002.
[58] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt & A. Arenas, "T he real communication network behind the formal chart: Community structure in organizations", Journal of Economic Behavior & Organization, 4(61), 653-667, 2006.
[59] L. Salwinski, C. S. Miller , A. J. Smith, F. K. Pettit, J. U. Bowie & D. Eisenberg, "The Database of Interacting Proteins: 2004 update", Nucleic Acids Research, 32, 449-451, 2004.
[60] J.-H. Park & H.-Y. Kwon, "Cyberattack detection model using community detection and text analysis on social media", ICT Express, 2021.
[61] M. Huang, Q. Jiang , Q. Qu, L. Chen & H. Chen, "Information fusion oriented heterogenous social network for friend recommendation via community detection", Applied Soft Computing, 114, 2022.
[62] S. Kwon, M. Cha & K. Jung, "Rumor Detection over Varying Time Windows", PLOS ONE, 1(12), 2017.
[63] Z. Xiaomei, Y. Jing, Z. Jianpei & H. Hongyu, "Microblog sentiment analysis with wead dependency connections", Knowledge-Based Systems, 142, 170-180, 2018.
[64] N. R. Usha, A. Réka & S. Kumara, "Near linear time algorithm to detect community structures in large-scale networks", Physical Review E, 3(76), 36-106, 2007.
[65] P. Pons & M. Latapy, "Computing Communities in Large Networks Using Random Walks", International Symposium on Computer and Information Sciences (ISCIS 2005), Istanbul, T urkey, 2005.
[66] M. Rosvall & C. T. Bergstrom, "Maps of random walks on complex networks reveal community structure,» Proceedings of the National Academy of Sciences of the United States of America", 4(105), 1118-1123, 2008.
[67] M. Speriosu, N. Sudan, S. Upadhyay & J. Baldridge, "T witter polarity classification with label propagation over lexical links and the follower graph", Proceedings of the First Workshop on Unsupervised Learning in NLP , Edinburgh, Scotland, 2011.
[68] D. A. Shamma, L. Kennedy & E. F. Churchill, "Tweet the debates: understanding community annotation of uncollected sources", Proceedings of the first SIGMM workshop on Social media, 3-10, 2009.
[69] D. Singh & R. Garg, "NI-Louvain: A novel algorithm to detect overlapping communities with influence analysis", Journal of King Saud University - Computer and Information Sciences, 2021.
[70] U. Brandes & J. Hildebrand, "Smallest graphs with distinct singleton centers", Network Science, 3(2), 416-418, 2014.
[71] J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang & J. T ang, "GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training", Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, California, United States, 2020.
[72] M. Huang, G. Zou, B. Zhang, L. Yue , G. Yajun & K. Jiang, "Overlapping community detection in heterogenous social networks via the user model" Information Sciences, 432, 146-184, 2018.
[73] J. Xie , S. Kelley & B. K. Szymanski, "Overlapping community detection in networks: the state-of-the-art and comperative study" ACM Computing Surveys (csur), 4(45), 43, 2013.
[74] A. Arenas, A. Diaz-Guillera & C. J. Perez-Vicente, "Synchronization processes in complex networks", Physica D: Nonlinear Phenomena, 21-2(224), 27-34, 2006.
[75] A. Arenas, A. Diaz-Guilera & C. J. Perez-Vicente, "Synchronization Reveals T opological Scales in Complex Networks", Physical Review Letters, 11(96), 102-114, 2006.
[76] G. Xu, M. Hu & C. Ma, "Secure and smart autonomous multi-robot systems for opinion spammer detection", Information Sciences, 576, pp. 681-693, 2021.
[77] S. Rayana & L. Akoglu, "Collective Opinion Spam Detection: Bridging ReviewNetworks and Metadata", Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data, New York, United States of America.
[78] Z. Yamak, J. Saunier & L. Vercouter, "SocksCatch: Automatic detection and grouping of sockpuppets in social media", Knowledge-Based Systems, 149, 124-142, 2018.
[79] W. Jia, R. Ma, L. Yan, W. Niu & Z. Ma, "TT-graph: A newmodel for building social network graphs from texts with time series", Expert Systems With Applications, 192, 2022.
[80] C. T u, H. Liu, Liu Zhiyuan & M. Sun, "CANE: Context-aware network embedding for relation modeling", Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017.
[81] J. R. Ashford, L. D. T urner, R. M. Whitaker, A. Preece & D. Felmlee, "Understanding the characteristics of COVID-19 misinformation communities through graphlet analysis", Online Social Networks and Media, 27, 2022.
[82] B. B. Y. Cheng, B. Ryan, D. A. Copland & S. J. Wallace, "Prognostication in post -stroke aphasia: speech pathologists' clinical insights on formulation and delivering information about recovery", Disability and Rehabilitation, 1-14, 2020.
[83] Z. Mossie & J.-H. Wang, "Vulnerable community identificaiton using hate speech detection on social media", Information Processing and Management, 3(57), 87-102, 2020.
[84] M. R. M. T alabis, R. McPherson, I. Miyamoto, J. L. Martin & D. Kaye, "Chapter 1 - Analytics Defined", Information Security Analysis, Boston, Syngress, 1-12, 2015.
[85] Y. Chen, R. Kong & L. Kong, "14 - Applications of artificial intelligence in astronomical big data" Big Data in Astronomy, Elsevier, 347-375, 2020.
[86] C. McCue, "Chapter 7 - Indentification, Characterization, and Modeling", Data Mining and Predictive Analysis (Second Edition), Boston, Butterworth-Heinemann, 137-155, 2015.
[87] N. T una, A. Sebatlı Sağlam, F. Çavdur, "Covid-19 Salgını ile İlgili Paylaşımlar Üzerinde Veri Analizi", Journal of Information T echnologies, 15(1), 13-23, 2022.