REGRESSION METHODS FOR SOCIAL MEDIA DATA ANALYSIS

In the early 2000s, the more traditional modes of communication via mobile devices were voice calls, emails, and short message services (SMS). Nowadays, communication through mobile applications such as WhatsApp, Facebook, Twitter, Instagram, etc. About Facebook the leading social network with monthly active users of about 2.85 billion people. With this number of users, a large amount of data is generated. Exploring this data provides an insight into the users’ activities which can aid in tackling security challenges and business planning, among other benefits. This study presents a neighborhood component analysis (NCA) and relief-based weight generation methods for a regression task on Facebook data. The features are calculated using the weight generated and four widely used activation functions. The features are then fed to four regression models for prediction. The proposed model is used to predict nine different attributes of the FB dataset whose values are continuous. RMSE, R-squared, MSE, MAE, and training time were calculated and used as evaluation metrics for all nine cases. The average R-square value of the Relief and NCA-based methods were calculated as 0.9689 and 0.9667, respectively. The results indicated that our proposed methods are very efficient and successful for regression tasks on Facebook data.

___

  • Sutcliffe, A. G., Binder, J. F., and Dunbar, R. I., "Activity in social media and intimacy in social relationships," Computers in Human Behavior, vol. 85, pp. 227-235, 2018.
  • Zeppelzauer, M. and Schopfhauser, D., "Multimodal classification of events in social media," Image and Vision Computing, vol. 53, pp. 45-56, 2016.
  • Petkos, G., Papadopoulos, S., and Kompatsiaris, Y., "Social event detection using multimodal clustering and integrating supervisory signals," in Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, 2012, p. 23.
  • Petkos, G., Papadopoulos, S., Mezaris, V., and Kompatsiaris, Y., "Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation," in MediaEval, 2014.
  • Yadav, M., Joshi, Y., and Rahman, Z., "Mobile social media: The new hybrid element of digital marketing communications," Procedia-social and behavioral Sciences, vol. 189, pp. 335-343, 2015.
  • Atzori, L., Iera, A., Morabito, G., and Nitti, M., "The social internet of things (siot)–when social networks meet the internet of things: Concept, architecture and network characterization," Computer networks, vol. 56, pp. 3594-3608, 2012.
  • Batrinca, B. and Treleaven, P. C., "Social media analytics: a survey of techniques, tools and platforms," Ai & Society, vol. 30, pp. 89-116, 2015.
  • Marturana, F. and Tacconi, S., "A Machine Learning-based Triage methodology for automated categorization of digital media," Digital Investigation, vol. 10, pp. 193-204, 2013.
  • Dey, N., Borah, S., Babo, R., and Ashour, A. S., Social Network Analytics: Computational Research Methods and Techniques: Academic Press, 2018.
  • Raynes-Goldie, K., "Aliases, creeping, and wall cleaning: Understanding privacy in the age of Facebook," First Monday, vol. 15, 2010.
  • Singh, M., Bansal, D., and Sofat, S., "Behavioral analysis and classification of spammers distributing pornographic content in social media," Social Network Analysis and Mining, vol. 6, p. 41, 2016.
  • Injadat, M., Salo, F., and Nassif, A. B., "Data mining techniques in social media: A survey," Neurocomputing, vol. 214, pp. 654-670, 2016.
  • Sapountzi, A. and Psannis, K. E., "Social networking data analysis tools & challenges," Future Generation Computer Systems, vol. 86, pp. 893-913, 2018.
  • Panigrahi, R. and Borah, S., "Classification and Analysis of Facebook Metrics Dataset Using Supervised Classifiers," Social Network Analytics: Computational Research Methods and Techniques, p. 1, 2018.
  • Cui, Y., Meng, C., He, Q., and Gao, J., "Forecasting current and next trip purpose with social media data and Google Places," Transportation Research Part C: Emerging Technologies, vol. 97, pp. 159-174, 2018.
  • Zhang, Z., He, Q., Gao, J., and Ni, M., "A deep learning approach for detecting traffic accidents from social media data," Transportation research part C: emerging technologies, vol. 86, pp. 580-596, 2018.
  • Ertugrul, Ö. F., "Forecasting electricity load by a novel recurrent extreme learning machines approach," International Journal of Electrical Power & Energy Systems, vol. 78, pp. 429-435, 2016.
  • Fernández-Delgado, M., Sirsat, M., Cernadas, Alawadi, E., S., Barro, S., and Febrero-Bande, M., "An extensive experimental survey of regression methods," Neural Networks, 2018.
  • Vanli, N. D., Sayin, M. O., Mohaghegh, M., Ozkan, H., and Kozat, S. S., "Nonlinear regression via incremental decision trees," Pattern Recognition, vol. 86, pp. 1-13, 2019.
  • Van Erp, S., Oberski, D. L., and Mulder, J., "Shrinkage priors for Bayesian penalized regression," Journal of Mathematical Psychology, vol. 89, pp. 31-50, 2019.
  • Ertuğrul, Ö. F. and Tağluk, M. E., "A novel version of k nearest neighbor: Dependent nearest neighbor," Applied Soft Computing, vol. 55, pp. 480-490, 2017.
  • Prashanth, R., Roy, S. D., Mandal, P. K., and Ghosh, S., "Automatic classification and prediction models for early Parkinson’s disease diagnosis from SPECT imaging," Expert Systems with Applications, vol. 41, pp. 3333-3342, 2014.
  • Yang, W., Wang, K., and Zuo, W., "Fast neighborhood component analysis," Neurocomputing, vol. 83, pp. 31-37, 2012.
  • Oliva, J. T. and Rosa, J. L. G., "Classification for EEG Report Generation and Epilepsy Detection," Neurocomputing, 2019.
  • Alpaydin, E., Introduction to machine learning: MIT press, 2014.
  • Seber, G. A. and Lee, A. J., Linear regression analysis vol. 329: John Wiley & Sons, 2012.
  • Bui, D. T., Tuan, T. A., Klempe, H., Pradhan, B., and Revhaug, I., "Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree," Landslides, vol. 13, pp. 361-378, 2016.
  • Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., and Vapnik, V., "Support vector regression machines," in Advances in neural information processing systems, 1997, pp. 155-161.
  • Hultquist, C., Chen, G., and Zhao, K., "A comparison of Gaussian process regression, random forests and support vector regression for burn severity assessment in diseased forests," Remote sensing letters, vol. 5, pp. 723-732, 2014.
  • Balestriero, R. and Baraniuk, R. G., "From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference," arXiv preprint arXiv:1810.09274, 2018.
  • Sharma, K., Garg, R., Nagpal, C., and Garg, R., "Selection of optimal software reliability growth models using a distance based approach," IEEE Transactions on Reliability, vol. 59, pp. 266-276, 2010.
  • Kanmani, S., Uthariaraj, V. R., Sankaranarayanan, V., and Thambidurai, P., "Object oriented software quality prediction using general regression neural networks," ACM SIGSOFT Software Engineering Notes, vol. 29, pp. 1-6, 2004.