Towards the design and implementation of an OSN crawler: a case of Turkish Facebook users
Towards the design and implementation of an OSN crawler: a case of Turkish Facebook users
Online Social Networks OSNs are extremely popular services that allow users to interact with each other and share content. Due to the large amounts of data shared by users, OSNs are also rich data sources for research in social network analysis. Studying the usage of OSNs helps to understand users' content-sharing behavior and privacy concerns. In order to do so, collecting data is a necessary first step. However, Application Programming Interfaces APIs provided by OSN providers have several limitations which make it difficult to access secured information. In this paper, we present the design and implementation of an OSN crawler, discuss the challenges of this task and our workarounds towards accessing public OSN data. Moreover, we perform analyses of the collected data to indicate users' sharing behavior and give a detailed discussion of these analyses from the perspective of individual privacy protection over OSNs. Our crawler overcomes most of the restrictions of OSN APIs and collects all forms of OSN user interactions as well as every bit of public data posted on an OSN. Most of the existing studies collect OSN data using focused crawlers and therefore are capable of collecting only the desired type of data. Our crawler, on the other hand, provides a holistic view. On the popular Facebook OSN, our crawler captures user relationships like kinship, friendship and attributes like profile items, events, posts, comments, replies, meta-data of activities i.e., posting time, location, tagged users etc. . To the best of our knowledge, ours is the most comprehensive OSN data collection effort and also the first study focused on the behavior of OSN users in Turkey.
___
- [1] F. Abdesslem, I. Parris, and T. Henderson. Reliable online social
network data collection. London: Springer, 2012, Ch.8.
- [2] B. Viswanath, A. Mislove, M. Cha, and K. Gummadi. “On
the evolution of user interaction in facebook”, The 2nd ACM
workshop on online social networks, Barcelona, Spain, pp. 37-
42, 16-21 August 2009.
- [3] M. Wani, N. Agarwal, S. Jabin, and S. Hussai. “Design and
Implementation of iMacros-based Data Crawler for Behavioral
Analysis of Facebook Users”, Computer Science: Social and
Information Networks, February 2018.
- [4] “Facebook Reports Fourth Quarter and Full Year 2019
Results”, https://investor.fb.com/investor-news/default.aspx, accessed: 2020-01-03.
- [5] “The top 500 sites on the web”, https://www.alexa.com/topsites,
accessed: 2020-01-03.
- [6] S. Mfenyana, N. Moorosi, and M. Thinyane. “Facebook Crawler
Architecture for Opinion Monitoring and Trend Analysis Purposes”, Southern Africa Telecommunication Networks and Applications Conference (SATNAC), Port Elizabeth, Eastern Cape,
South Africa, 1-3 September 2014.
- [7] Y. Modi, and I. Gandhi. “Internet sociology: Impact of Facebook
addiction on the lifestyle and other recreational activities of the
Indian youth”, SHS Web of Conferences, Jacarta, Indonesia, pp.
1-4, 14-16 October 2013.
- [8] Z. Xiao, B. Liu, H. Hu, and T. Zhang. “Design and implementation of facebook crawler based on interaction simulation”, IEEE
11th International Conference on Trust, Security and Privacy
in Computing and Communications (TrustCom), Liverpool, England, pp. 1109-1112, 25-27 June 2012.
- [9] T. Siwag, P. Sirohi, and N. Singhal. “Novel Architecture of a
Focused Crawler For Social Websites”, International Journal of
Computer Engineering and Applications, Vol.7, No.3, pp. 132-
144, September 2014.
- [10] D. Terrana, A. Augello, and G. Pilato. “Facebook users relationships analysis based on sentiment classification”, IEEE International Conference on Semantic Computing (ICSC), Newport
Beach, CA, USA, pp. 290-296, 16-18 June 2014.
- [11] D. Terrana, A. Augello, and G. Pilato. “A system for analysis
and comparison of social network profiles”, IEEE International
Conference on Semantic Computing (ICSC), Anaheim, CA, USA,
pp. 109-115, 7-9 February 2015.
- [12] M. Conti, R. Poovendran, and M. Secchiero. “Fakebook: Detecting fake profiles in on-line social networks”, IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining, Istanbul, Turkey, pp. 1071-1078, 26-29 August 2012.
- [13] B. Jansen, K. Sobel, and G. Cook. “Classifying ecommerce
information sharing behaviour by youths on social networking
sites”, Journal of Information Science, Vol.37, No.2, pp. 120-
136, April 2011.
- [14] C. Wong, K. Wong, K. Ng, W. Fan, and K. Yeung. “Design of
a crawler for online social networks analysis”, WSEAS Transactions on Communications, Vol.13, pp. 264–274, 2014.
- [15] “SNAP Datasets: Stanford large network dataset collection”,
http://snap.stanford.edu/data, accessed: 2020-12-12.
- [16] “EU General Data Protection Regulation (GDPR)”,
https://www.ingramflyhigher.com/assets/2018/gdpr/img/ingramm
icro-gdpr-1pp.pdf, accessed: 2020-04-01.
- [17] “Personal Data Protection Law in Turkey”,
https://www.kvkk.gov.tr, accessed: 2020-04-01.
- [18] “Kis¸isel Verileri Koruma Kurumu”. https://www.kvkk.gov.tr/,
2020-04-01.
- [19] A. Gamboa, and H. Gonc¸alves. “Customer loyalty through
social networks: Lessons from Zara on Facebook”, Business
Horizons, Vol.57, No.6, pp. 709-717, November–December 2014.
- [20] R. Motamedi, R. Gonzalez, R. Farahbakhsh, A. Cuevas, R.
Cuevas, and R. Rejaie. What osn should i use? characterizing
user engagement in major osns. Technical report. University
of Madrid. http://www.it.uc3m.es/ rgonza1/pubs/whatOSN.pdf,
2013.
- [21] Z. Kastrati, A. Imran, S. Yildirim-Yayilgan, and F. Dalipi.
“Analysis of Online Social Networks Posts to Investigate Suspects Using SEMCON”, International Conference on Social
Computing and Social Media, Los Angeles, CA, USA, pp. 148-
157, 2-7 August 2015.
- [22] S. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti.
“Crawling facebook for social network analysis purposes”, International conference on web intelligence, mining and semantics,
Sogndal, Norway, pp. 52, 25-27 May 2011.
- [23] S. Mittal, and G. Sahu. “Twitter Crawler with Multilingual
Text Classification”, International Journal of Innovations &
Advancement in Computer Science (IJIACS), Vol.6, No.6, pp.
77-83, June 2017.
- [24] H. Chen, K. Hsu, and S. Chiu. “Event Detection in an ego
Network on Facebook”, Pacific Asia Conference on Information
Systems (PACIS), Chiayi, Taiwan, pp. 172, 27 June-1 July 2016.
- [25] L. Passaro, A. Bondielli, and A. Lenci. “Fb-news15: A topicannotated facebook corpus for emotion detection and sentiment
analysis”, Third Italian Conference on Computational Linguistics
(CLiC-it), Napoli, Italy, pp. 228-232, 5-6 December 2016.
- [26] R. Kridalukmana. “Generic social network data crawler using
attributed graph”, 2nd International Conference on Information
Technology, Computer, and Electrical Engineering (ICITACEE),
Semarang, Indonesia, pp. 138-142, 16-18 October 2015.
- [27] G. Flores, A. Lorena, C. Penteado, and C. Kamienski. “Can
Social Network Influence Voters?”, Brazilian Symposium on
Computer Networks and Distributed Systems (SBRC), Brasilia,
Brazil, pp. 3-8, 6-10 May 2013.
- [28] Y. Abid, A. Imine, and M. Rusinowitch. “Sensitive attribute
prediction for social networks users”, EDBT/ICDT 2018 Joint
Conference, Vienna, Austria, pp. 28-35, 26-29 March 2018.
- [29] L. Higham, and J. Kawash. “Critical sections and producer/consumer queues in weak memory systems”, International Symposium on Parallel Architectures, Algorithms and Networks
(I-SPAN’97), Taipei, Taiwan, pp. 56-63, 20 December 1997.
- [30] U. Gundecha. Selenium Testing Tools Cookbook. Packt Publishing Ltd./Birmingham, 2012.
- [31] D. Gilbert. The jfreechart class library. Developer Guide. Object
Refinery, 2002.
- [32] S. Syn, and S. Oh. “Why do social network site users share
information on Facebook and Twitter?”, Journal of Information
Science, Vol.41, No.5, pp. 553-569, May 2015.
- [33] K. Hew, and N. Hara. “Knowledge sharing in online environments: A qualitative case study”, Journal of the American
Society for Information Science and Technology, Vol.58, No.14,
pp. 2310-2324, December 2007.