Your Username Can Give You Away: Matching Turkish OSN Users with Usernames

Your Username Can Give You Away: Matching Turkish OSN Users with Usernames

User profile matching i.e., user cross-referencing, user identification aims to find accounts that belong to the same users over different websites or online social networks OSNs . Solving this problem can be useful for many operations and functionalities such as friend recommendation and link prediction across different OSNs. Additionally, identifying users across different OSNs may enable an adversary to aggregate incomplete information of users. Hereby, an adversary can extract and use online footprint of users to violate their privacy and security via putting them into threats such as identity theft, online stalking, and blackmailing among many others. Usernames are indispensable elements of all websites that require user registration. Even though usernames are generally short strings, they potentially reflect users’ characteristics and habits such as the political sense of belonging, hometown, and so on. In this study, we make an effort to match users of distinct OSNs relying only on their usernames. We use two different approaches based on machine learning and vector-based username similarity to build our learning function. We also explore different feature spaces from the literature and further investigate which approach produces better results. We conducted our experiments on a real-world username data set that is extracted from the OSN accounts of Turkish users we crawled in our previous work. Our results show that building learning function by binary classification outperforms the similarity approach and it achieves the best F-score of 0.921 without feature selection and extension.

___

  • M. Wani, N. Agarwal, S. Jabin, and S. Hussai. “Design and Implementation of iMacros-based Data Crawler for Behavioral Analysis of Facebook Users”, Computer Science: Social and Information Networks, February 2018.
  • B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. “On the evolution of user interaction in facebook”, Proceed- ings of the 2nd ACM workshop on Online social networks, Barcelona, Spain, pp. 37-42, 16-21 August 2009.
  • O. Peled, M. Fire, L. Rokach, and Y. Elovici. “Entity matching in online social networks”, IEEE international conference on social computing (socialcom), Washington, USA, pp. 339-344, 8-14 September 2013.
  • Y. Li, Y. Peng, W. Ji, Z. Zhang, and Q. Xu. “User identification based on display names across online social networks”, IEEE Access, Vol.5, pp. 17342-17353, August 2017.
  • N. Bennacer, C. N. Jipmo, A. Penta, and G. Quercini. “Matching user profiles across social networks”, Interna- tional Conference on Advanced Information Systems En- gineering, Thessaloniki, Greece, pp. 424-438, 16-20 June 2014.
  • O. Goga. “Matching user accounts across online social networks: methods and applications”, Université Pierre et Marie Curie, LIP6-Laboratoire d’Informatique de Paris 6, Doctoral dissertation, 151p, Paris, France, May 2014.
  • J. Vosecky, D. Hong, and V. Y. Shen. “User identification across multiple social networks”, IEEE First International Conference on Networked Digital Technologies, Ostrava, Czech Republic, pp. 360-365, 29-31 July 2009.
  • O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K. P. Gummadi. “On the reliability of profile matching across large online social networks”, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, pp. 1799-1808, 10-13 August 2015.
  • R. Zafarani, and H. Liu. “Connecting users across social media sites: a behavioral-modeling approach”, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, USA, pp. 41-49, 11-14 August 2013.
  • E. Raad, R. Chbeir, and A. Dipanda. “User profile match- ing in social networks”, IEEE 13th International Conference on Network-Based Information Systems (NBiS), Takayama, Japan, pp. 297-304, 14-16 September 2010.
  • X. Yi, E. Bertino, F. Y. Rao, K. Y. Lam, S. Nepal, and A. Bouguettaya. “Privacy-Preserving User Profile Matching in Social Networks”, IEEE Transactions on Knowledge and Data Engineering, Vol.32, No.8, pp. 1572-1585, August 2020.
  • F. M. Naini, J. Unnikrishnan, P. Thiran, and M. Vetterli. “Where you are is who you are: User identification by matching statistics”, IEEE Transactions on Information Forensics and Security, Vol.11, No.2, pp. 358-372, February 2016.
  • O. Coban, A. Inan, and S. A. Ozel. “Towards the design and implementation of an OSN crawler: a case of Turk- ish Facebook users”, International Journal of Information Security Science, Vol.9, No.2, pp. 76-93, June 2020.
  • S. I. Bhat, T. Arif, and M. B. Malik. “A Framework for User Identity Resolutions across Social Networks”, International Journal of Scientific Research in Computer Science, Engineering and Information Technology, Vol.4, No.1, pp. 307-313, March-April 2018.
  • Y. Li, Y. Peng, Z. Zhang, H. Yin, and Q. Xu. “Matching user accounts across social networks based on username and display name”, World Wide Web, Vol.22, pp. 1095- 1097, April 2018.
  • P. Jain. “Automated methods for identity resolution across online social networks”, Indraprastha Institute of Informa- tion Technology Delhi, Doctoral dissertation, 137p, New Delhi, India, April 2016.
  • X. Zhou, X. Liang, H. Zhang, and Y. Ma. 2016. “Cross- platform identification of anonymous identical users in multiple social media networks”, IEEE transactions on knowledge and data engineering, Vol.28, No.2, pp. 411-424, February 2016.
  • R. Kaushal. “A Systematic Review on User Identity Link- age across Online Social Networks”, Indraprastha Institute of Information Technology Delhi, Doctoral dissertation, 50p, New Delhi, India, February 2020.
  • Y. Li, Y. Peng, Z. Zhang, M. Wu, Q. Xu, and H. Yin. “A deep dive into user display names across social networks”, Information Sciences, Vol.447, pp. 186-204, June 2018.
  • Y. Wang, T. Liu, Q. Tan, J. Shi, and L. Guo. “Identifying users across different sites using usernames”, Procedia Computer Science, Vol. 80, pp. 376-385, June 2016.
  • A. Malhotra, L. Totti, W. Meira Jr, P. Kumaraguru, and V. Almeida. “Studying user footprints in different online social networks”, Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey, pp. 1065-1070, 26-29 August 2012.
  • H. Hazimeh, E. Mugellini, O. A. Khaled, and P. Cudré- Mauroux. SocialMatching++: “A Novel Approach for In- terlinking User Profiles on Social Networks”, The 16th International Semantic Web Conference (ISWC), Vienna, Austria, 21-25 October 2017.
  • Y. Sha, Q. Liang, and K. Zheng. “Matching user accounts across social networks based on users message”, Procedia Computer Science, Vol. 80, pp. 2423-2427, June 2016.
  • Q. Liu, J. Li, Y. Wang, G. Xing, and Y. Ren. “Ac- count matching across heterogeneous networks”, IEEE 5th International Conference on Game Theory for Networks (GAMENETS), Beijing, China, pp. 1-5, 25-27 November 2014.
  • L. Xing, K. Deng, H. Wu, P. Xie, and J. Gao. “Behavioral Habits-Based User Identification Across Social Networks”. Symmetry, Vol.11, No.9, pp. 1134, September 2019.
  • H. Van Pham, and V.T. Nguyen. “A novel approach using context matching algorithm and knowledge inference for user identification in social networks”. Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong City, Viet Nam, pp. 149-153. 17-19 January 2020.
  • Y. Li, Z. Su, J. Yang, and C. Gao. “Exploiting similarities of user friendship networks across social networks for user identification”, Information Sciences, Vol.506, pp. 78-98, January 2020.
  • Z. Fang, Y. Cao, Y. Liu, J. Tan, L. Guo, and Y. Shang. “A co-training method for identifying the same person across social networks”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, Canada, pp. 1412-1416, 14-16 November 2017.
  • M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reute- mann, and I. H. Witten. “The WEKA data mining soft- ware: an update”, ACM SIGKDD explorations newsletter, Vol.11, No.1, pp. 10-18, November 2009.
  • H. Peng, F. Long, and C. Ding. “Feature selection based on mutual information criteria of max-dependency, max- relevance, and min-redundancy”, IEEE Transactions on pattern analysis and machine intelligence, Vol.27, No.8, pp. 1226-1238. August 2005.
  • B. Bengfort, and R. Bilbro. “Yellowbrick: Visualizing the scikit-learn model selection process”, Journal of Open Source Software, Vol.4, No.35, pp. 1075, March 2019.