Towards the design and implementation of an OSN crawler: a case of Turkish Facebook users

Towards the design and implementation of an OSN crawler: a case of Turkish Facebook users

Online Social Networks OSNs are extremely popular services that allow users to interact with each other and share content. Due to the large amounts of data shared by users, OSNs are also rich data sources for research in social network analysis. Studying the usage of OSNs helps to understand users' content-sharing behavior and privacy concerns. In order to do so, collecting data is a necessary first step. However, Application Programming Interfaces APIs provided by OSN providers have several limitations which make it difficult to access secured information. In this paper, we present the design and implementation of an OSN crawler, discuss the challenges of this task and our workarounds towards accessing public OSN data. Moreover, we perform analyses of the collected data to indicate users' sharing behavior and give a detailed discussion of these analyses from the perspective of individual privacy protection over OSNs. Our crawler overcomes most of the restrictions of OSN APIs and collects all forms of OSN user interactions as well as every bit of public data posted on an OSN. Most of the existing studies collect OSN data using focused crawlers and therefore are capable of collecting only the desired type of data. Our crawler, on the other hand, provides a holistic view. On the popular Facebook OSN, our crawler captures user relationships like kinship, friendship and attributes like profile items, events, posts, comments, replies, meta-data of activities i.e., posting time, location, tagged users etc. . To the best of our knowledge, ours is the most comprehensive OSN data collection effort and also the first study focused on the behavior of OSN users in Turkey.

___

  • [1] F. Abdesslem, I. Parris, and T. Henderson. Reliable online social network data collection. London: Springer, 2012, Ch.8.
  • [2] B. Viswanath, A. Mislove, M. Cha, and K. Gummadi. “On the evolution of user interaction in facebook”, The 2nd ACM workshop on online social networks, Barcelona, Spain, pp. 37- 42, 16-21 August 2009.
  • [3] M. Wani, N. Agarwal, S. Jabin, and S. Hussai. “Design and Implementation of iMacros-based Data Crawler for Behavioral Analysis of Facebook Users”, Computer Science: Social and Information Networks, February 2018.
  • [4] “Facebook Reports Fourth Quarter and Full Year 2019 Results”, https://investor.fb.com/investor-news/default.aspx, accessed: 2020-01-03.
  • [5] “The top 500 sites on the web”, https://www.alexa.com/topsites, accessed: 2020-01-03.
  • [6] S. Mfenyana, N. Moorosi, and M. Thinyane. “Facebook Crawler Architecture for Opinion Monitoring and Trend Analysis Purposes”, Southern Africa Telecommunication Networks and Applications Conference (SATNAC), Port Elizabeth, Eastern Cape, South Africa, 1-3 September 2014.
  • [7] Y. Modi, and I. Gandhi. “Internet sociology: Impact of Facebook addiction on the lifestyle and other recreational activities of the Indian youth”, SHS Web of Conferences, Jacarta, Indonesia, pp. 1-4, 14-16 October 2013.
  • [8] Z. Xiao, B. Liu, H. Hu, and T. Zhang. “Design and implementation of facebook crawler based on interaction simulation”, IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Liverpool, England, pp. 1109-1112, 25-27 June 2012.
  • [9] T. Siwag, P. Sirohi, and N. Singhal. “Novel Architecture of a Focused Crawler For Social Websites”, International Journal of Computer Engineering and Applications, Vol.7, No.3, pp. 132- 144, September 2014.
  • [10] D. Terrana, A. Augello, and G. Pilato. “Facebook users relationships analysis based on sentiment classification”, IEEE International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, pp. 290-296, 16-18 June 2014.
  • [11] D. Terrana, A. Augello, and G. Pilato. “A system for analysis and comparison of social network profiles”, IEEE International Conference on Semantic Computing (ICSC), Anaheim, CA, USA, pp. 109-115, 7-9 February 2015.
  • [12] M. Conti, R. Poovendran, and M. Secchiero. “Fakebook: Detecting fake profiles in on-line social networks”, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, pp. 1071-1078, 26-29 August 2012.
  • [13] B. Jansen, K. Sobel, and G. Cook. “Classifying ecommerce information sharing behaviour by youths on social networking sites”, Journal of Information Science, Vol.37, No.2, pp. 120- 136, April 2011.
  • [14] C. Wong, K. Wong, K. Ng, W. Fan, and K. Yeung. “Design of a crawler for online social networks analysis”, WSEAS Transactions on Communications, Vol.13, pp. 264–274, 2014.
  • [15] “SNAP Datasets: Stanford large network dataset collection”, http://snap.stanford.edu/data, accessed: 2020-12-12.
  • [16] “EU General Data Protection Regulation (GDPR)”, https://www.ingramflyhigher.com/assets/2018/gdpr/img/ingramm icro-gdpr-1pp.pdf, accessed: 2020-04-01.
  • [17] “Personal Data Protection Law in Turkey”, https://www.kvkk.gov.tr, accessed: 2020-04-01.
  • [18] “Kis¸isel Verileri Koruma Kurumu”. https://www.kvkk.gov.tr/, 2020-04-01.
  • [19] A. Gamboa, and H. Gonc¸alves. “Customer loyalty through social networks: Lessons from Zara on Facebook”, Business Horizons, Vol.57, No.6, pp. 709-717, November–December 2014.
  • [20] R. Motamedi, R. Gonzalez, R. Farahbakhsh, A. Cuevas, R. Cuevas, and R. Rejaie. What osn should i use? characterizing user engagement in major osns. Technical report. University of Madrid. http://www.it.uc3m.es/ rgonza1/pubs/whatOSN.pdf, 2013.
  • [21] Z. Kastrati, A. Imran, S. Yildirim-Yayilgan, and F. Dalipi. “Analysis of Online Social Networks Posts to Investigate Suspects Using SEMCON”, International Conference on Social Computing and Social Media, Los Angeles, CA, USA, pp. 148- 157, 2-7 August 2015.
  • [22] S. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti. “Crawling facebook for social network analysis purposes”, International conference on web intelligence, mining and semantics, Sogndal, Norway, pp. 52, 25-27 May 2011.
  • [23] S. Mittal, and G. Sahu. “Twitter Crawler with Multilingual Text Classification”, International Journal of Innovations & Advancement in Computer Science (IJIACS), Vol.6, No.6, pp. 77-83, June 2017.
  • [24] H. Chen, K. Hsu, and S. Chiu. “Event Detection in an ego Network on Facebook”, Pacific Asia Conference on Information Systems (PACIS), Chiayi, Taiwan, pp. 172, 27 June-1 July 2016.
  • [25] L. Passaro, A. Bondielli, and A. Lenci. “Fb-news15: A topicannotated facebook corpus for emotion detection and sentiment analysis”, Third Italian Conference on Computational Linguistics (CLiC-it), Napoli, Italy, pp. 228-232, 5-6 December 2016.
  • [26] R. Kridalukmana. “Generic social network data crawler using attributed graph”, 2nd International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia, pp. 138-142, 16-18 October 2015.
  • [27] G. Flores, A. Lorena, C. Penteado, and C. Kamienski. “Can Social Network Influence Voters?”, Brazilian Symposium on Computer Networks and Distributed Systems (SBRC), Brasilia, Brazil, pp. 3-8, 6-10 May 2013.
  • [28] Y. Abid, A. Imine, and M. Rusinowitch. “Sensitive attribute prediction for social networks users”, EDBT/ICDT 2018 Joint Conference, Vienna, Austria, pp. 28-35, 26-29 March 2018.
  • [29] L. Higham, and J. Kawash. “Critical sections and producer/consumer queues in weak memory systems”, International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN’97), Taipei, Taiwan, pp. 56-63, 20 December 1997.
  • [30] U. Gundecha. Selenium Testing Tools Cookbook. Packt Publishing Ltd./Birmingham, 2012.
  • [31] D. Gilbert. The jfreechart class library. Developer Guide. Object Refinery, 2002.
  • [32] S. Syn, and S. Oh. “Why do social network site users share information on Facebook and Twitter?”, Journal of Information Science, Vol.41, No.5, pp. 553-569, May 2015.
  • [33] K. Hew, and N. Hara. “Knowledge sharing in online environments: A qualitative case study”, Journal of the American Society for Information Science and Technology, Vol.58, No.14, pp. 2310-2324, December 2007.